The trunk of Solr with the new ExtractingRequestHandler (Tika) will surely be the easiest way to get rolling. A simple script that recurses your folders and issues a simple request posting each file in turn to Solr will give you a full text searchable index in no time (well, ok, it'll take a little time, but it'll be as fast as anything else out there).

        Erik

On Dec 14, 2008, at 9:15 AM, Veselin Kantsev wrote:

Hello,
first of all, thanks for these great projects.
I discovered Lucene and its subs, a day ago and all these seem amazing.

My goal:
--------
A file server with numerous folders containing documents (pdf,doc,txt etc.)
that need to be indexed and searchable via a web interface or similar.
The number of files might be from 500 000 to 1 000 000 or so.
Ideally the solution would be capable of handling a lot more than that,
in case of future growth.

My question:
------------
Which of the projects (Lucene, Solr, Nutch) will be most suitable in my case?

Thank you much.

--
Veselin K

Reply via email to