Hello, I am now using solr 1.3 with tomcat6 on a debian lenny box. Could you please advise of any other instructions/HowTos on integrating Tika or maybe RichDocumentHandler with Solr, that I can find online? Apart from the Solr Wiki, as following those examples did not help in my case.
Thank you. Veselin K. On Wed, Dec 17, 2008 at 10:43:57AM +0000, Veselin K wrote: > Thank you Erik, Hoss. > > - If using either Solr's "stream.file" or Nutch's crawler, > what is the procedure of adding new files? > That is to say, if I did not know which are the new files in a > specific folder and thus I passed all files to Solr/Nutch, would it > skip the ones that have already been indexed? > > - Also what if I file gets modified, would Solr/Nutch detect > the change and re-index just this modified the file? > Or should some kind of cache be cleared and everything re-indexed? > > - In order to provide the user with an option to search the indexes of > two separete Solr/Nutch servers, do I need to link both servers > somehow and join their indexes into one, or is it just a question of > designing the web front-end so that it offers the choice to send your > search query to one or multiple different servers. > > > Thank you, > Veselin K > > > On Sun, Dec 14, 2008 at 11:22:00AM -0800, Chris Hostetter wrote: > > > > : the easiest way to get rolling. A simple script that recurses your > > folders > > : and issues a simple request posting each file in turn to Solr will give > > you a > > : full text searchable index in no time (well, ok, it'll take a little > > time, but > > : it'll be as fast as anything else out there). > > > > if all the files are "local" on the machine that Solr is running on you > > don't even need to POST them, Solr can be configured to read the files by > > local filename using the "stream.file" param... > > > > http://wiki.apache.org/solr/ContentStream > > > > that said: if your fileserver implementation already exposes all of the > > files over HTTP, then using Nutch and it's crawler might be an easier way > > to get started on indexing all of them ... hard to say without being in > > your shoes. you may want to experiement with both. > > > > > > > > -Hoss > >
