Thank you Erik, Hoss. - If using either Solr's "stream.file" or Nutch's crawler, what is the procedure of adding new files? That is to say, if I did not know which are the new files in a specific folder and thus I passed all files to Solr/Nutch, would it skip the ones that have already been indexed?
- Also what if I file gets modified, would Solr/Nutch detect the change and re-index just this modified the file? Or should some kind of cache be cleared and everything re-indexed? - In order to provide the user with an option to search the indexes of two separete Solr/Nutch servers, do I need to link both servers somehow and join their indexes into one, or is it just a question of designing the web front-end so that it offers the choice to send your search query to one or multiple different servers. Thank you, Veselin K On Sun, Dec 14, 2008 at 11:22:00AM -0800, Chris Hostetter wrote: > > : the easiest way to get rolling. A simple script that recurses your folders > : and issues a simple request posting each file in turn to Solr will give you > a > : full text searchable index in no time (well, ok, it'll take a little time, > but > : it'll be as fast as anything else out there). > > if all the files are "local" on the machine that Solr is running on you > don't even need to POST them, Solr can be configured to read the files by > local filename using the "stream.file" param... > > http://wiki.apache.org/solr/ContentStream > > that said: if your fileserver implementation already exposes all of the > files over HTTP, then using Nutch and it's crawler might be an easier way > to get started on indexing all of them ... hard to say without being in > your shoes. you may want to experiement with both. > > > > -Hoss >
