Hi Michael,
I wonder if you would be interested in cooperating on the extracting/index management bit. We use Lucene and our own extractor plugins for a Swing-application:
http://tockit.sf.net/docco
Code can be found here:
http://cvs.sourceforge.net/viewcvs.py/toscanaj/docco/
It is BSD-Style licenced. You did some more things than we (Zip/Rar and CHM for example), we have simple OOo support:
http://cvs.sourceforge.net/viewcvs.py/toscanaj/docco/source/org/tockit/docco/documenthandler/OpenOfficeDocumentHandler.java?rev=1.4&view=auto
We also have dynamic class loading -- although only with a very simple approach, we were considering moving on to Eclipse's model.
One thing I wanted to look at is PPT support -- but I don't know when. The thing is that I am running this little project on my own at the moment and it is not highest priority. But I'd be happy to synchronize some aspects with your project since I think it would be very valuable to have a generic document indexing framework which can be used in any Lucene application.
Cheers, Peter
Zilverline info wrote:
All,
I've just released a new candidate (*1.0-rc3*) that now supports plugins. You can create your own extractors
for various file formats. I've provided Extractors for Text, PDF, Word, and HTML.
It's also possible to specify your own handlers for archives. Say you have a RAR archive, and you have a program on your system that can extract the content from it, then you can specify that zilverline should use this program.
Zilverline is an free search engine based on lucene that's ready to roll, and can be simply dropped in a Servlet
Engine. It runs out of the box, and supports PDF, WORD, HTM, TXT, and CHM, and can index zip, rar, and many other formats.
Both on Windows and Linux.
Please take look at http://www.zilverline.org, and have a swing at it.
cheers,
Michael Franken
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
