The trunk of Solr with the new ExtractingRequestHandler (Tika) will
surely be the easiest way to get rolling. A simple script that
recurses your folders and issues a simple request posting each file in
turn to Solr will give you a full text searchable index in no time
(well, ok, it'll take a little time, but it'll be as fast as anything
else out there).
Erik
On Dec 14, 2008, at 9:15 AM, Veselin Kantsev wrote:
Hello,
first of all, thanks for these great projects.
I discovered Lucene and its subs, a day ago and all these seem
amazing.
My goal:
--------
A file server with numerous folders containing documents
(pdf,doc,txt etc.)
that need to be indexed and searchable via a web interface or similar.
The number of files might be from 500 000 to 1 000 000 or so.
Ideally the solution would be capable of handling a lot more than
that,
in case of future growth.
My question:
------------
Which of the projects (Lucene, Solr, Nutch) will be most suitable in
my case?
Thank you much.
--
Veselin K