There is some ongoing work for nutch.org.
May be we can bundle all work together?! <open source>
Nutch has alraeady a java *.doc, *.pdf parser as well .

Stefan

Pete Lewis wrote:

Hi Stefan

Using OpenOffice will enable you to parse 182 file formats, but its not a
pure java solution and you still need an alternate solution for pdfs.

I'd be interested in knowing whether anyone is working on a pure java
solution that would give us a single method for handling ms office
documments / pdfs / etc.

Cheers

Pete

----- Original Message ----- From: "Stefan Groschupf" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, November 05, 2003 10:26 AM
Subject: Re: Index entire filesystem





I had write to this list some days ago, to announce a possibility to
parse 182 file formats.
There was a tiny bug report some days ago, i hope i can fix it.

Browse the archive to figure out more.

Cheers
Stefan

Marcel Stor wrote:



Hi all,

I'm thinkin' about writing a search tool for my filesystem. I know such
things exist already but programming it myself is much more fun ;-)
So, I would have Lucene crawl through my filesystem and pass each file
to an appropriate indexer (PDF -> PDFbox, etc.). Yes, I run a Windows
system and would depend on the file ending to distinguish the file type.
Is this a good idea in general? Is there a list of available indexer for
the the different file types? Any other comments are also welcome.

Regards,
Marcel


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]








---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to