Nutch ?

Erik Hatcher Sun, 14 Dec 2008 06:56:22 -0800

The trunk of Solr with the new ExtractingRequestHandler (Tika) willsurely be the easiest way to get rolling. A simple script thatrecurses your folders and issues a simple request posting each file inturn to Solr will give you a full text searchable index in no time(well, ok, it'll take a little time, but it'll be as fast as anythingelse out there).


        Erik


On Dec 14, 2008, at 9:15 AM, Veselin Kantsev wrote:

Hello,
first of all, thanks for these great projects.
I discovered Lucene and its subs, a day ago and all these seemamazing.
My goal:
--------
A file server with numerous folders containing documents(pdf,doc,txt etc.)
that need to be indexed and searchable via a web interface or similar.
The number of files might be from 500 000 to 1 000 000 or so.
Ideally the solution would be capable of handling a lot more thanthat,
in case of future growth.

My question:
------------
Which of the projects (Lucene, Solr, Nutch) will be most suitable inmy case?
Thank you much.

--
Veselin K

Re: Indexing local PDFs: Lucene/Solr/Nutch ?

Reply via email to