> I have been looking for PDF and Word document parsers.  I have tried the
> contributions page on the Lucene site as suggested by a Lucene User. The
> PJEtymon does not have a Windows version.  The XPDF does not do the parsing
> very well.

I've run Etymon with some degree of success in window boxes. To parse word 
document you can have a look for OpenOffice. You can start OpenOffice to 
receive a socket connection. From your Java app, you open a connection to 
OpenOffice (using OpenOffice SDK), send the word document and it will convert 
it to text.

You can also use OpenOffice various other parsing. The url: www.openoffice.org

Note: I've never tried OpenOffice under windows, so I'm not sure how it will 
work, but we are using it here to index our word documents.

Regards,

-- 
Victor Hadianto
---------------
More are taken in by hope than by cunning. -- Vauvenargues

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to