Can you provide details about the part of the examples that weren't
clear? Perhaps I can clean up the docs or help you figure it out.
-Grant
On Dec 27, 2008, at 3:42 PM, Veselin Kantsev wrote:
Hello,
I am now using solr 1.3 with tomcat6 on a debian lenny box.
Could you please advise of any other instructions/HowTos on
integrating Tika or
maybe RichDocumentHandler with Solr, that I can find online?
Apart from the Solr Wiki, as following those examples did not help
in my
case.
Thank you.
Veselin K.
On Wed, Dec 17, 2008 at 10:43:57AM +0000, Veselin K wrote:
Thank you Erik, Hoss.
- If using either Solr's "stream.file" or Nutch's crawler,
what is the procedure of adding new files?
That is to say, if I did not know which are the new files in a
specific folder and thus I passed all files to Solr/Nutch, would it
skip the ones that have already been indexed?
- Also what if I file gets modified, would Solr/Nutch detect
the change and re-index just this modified the file?
Or should some kind of cache be cleared and everything re-indexed?
- In order to provide the user with an option to search the indexes
of
two separete Solr/Nutch servers, do I need to link both servers
somehow and join their indexes into one, or is it just a question of
designing the web front-end so that it offers the choice to send
your
search query to one or multiple different servers.
Thank you,
Veselin K
On Sun, Dec 14, 2008 at 11:22:00AM -0800, Chris Hostetter wrote:
: the easiest way to get rolling. A simple script that recurses
your folders
: and issues a simple request posting each file in turn to Solr
will give you a
: full text searchable index in no time (well, ok, it'll take a
little time, but
: it'll be as fast as anything else out there).
if all the files are "local" on the machine that Solr is running
on you
don't even need to POST them, Solr can be configured to read the
files by
local filename using the "stream.file" param...
http://wiki.apache.org/solr/ContentStream
that said: if your fileserver implementation already exposes all
of the
files over HTTP, then using Nutch and it's crawler might be an
easier way
to get started on indexing all of them ... hard to say without
being in
your shoes. you may want to experiement with both.
-Hoss
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ