Can you provide details about the part of the examples that weren't clear? Perhaps I can clean up the docs or help you figure it out.

-Grant

On Dec 27, 2008, at 3:42 PM, Veselin Kantsev wrote:

Hello,
I am now using solr 1.3 with tomcat6 on a debian lenny box.

Could you please advise of any other instructions/HowTos on integrating Tika or
maybe RichDocumentHandler with Solr, that I can find online?
Apart from the Solr Wiki, as following those examples did not help in my
case.


Thank you.

Veselin K.


On Wed, Dec 17, 2008 at 10:43:57AM +0000, Veselin K wrote:
Thank you Erik, Hoss.

- If using either Solr's "stream.file" or Nutch's crawler,
 what is the procedure of adding new files?
 That is to say, if I did not know which are the new files in a
 specific folder and thus I passed all files to Solr/Nutch, would it
 skip the ones that have already been indexed?

- Also what if I file gets modified, would Solr/Nutch detect
 the change and re-index just this modified the file?
 Or should some kind of cache be cleared and everything re-indexed?

- In order to provide the user with an option to search the indexes of
 two separete Solr/Nutch servers, do I need to link both servers
 somehow and join their indexes into one, or is it just a question of
designing the web front-end so that it offers the choice to send your
 search query to one or multiple different servers.


Thank you,
Veselin K


On Sun, Dec 14, 2008 at 11:22:00AM -0800, Chris Hostetter wrote:

: the easiest way to get rolling. A simple script that recurses your folders : and issues a simple request posting each file in turn to Solr will give you a : full text searchable index in no time (well, ok, it'll take a little time, but
: it'll be as fast as anything else out there).

if all the files are "local" on the machine that Solr is running on you don't even need to POST them, Solr can be configured to read the files by
local filename using the "stream.file" param...

        http://wiki.apache.org/solr/ContentStream

that said: if your fileserver implementation already exposes all of the files over HTTP, then using Nutch and it's crawler might be an easier way to get started on indexing all of them ... hard to say without being in
your shoes.  you may want to experiement with both.



-Hoss


--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ










Reply via email to