> I have been looking for PDF and Word document parsers. I have tried the > contributions page on the Lucene site as suggested by a Lucene User. The > PJEtymon does not have a Windows version. The XPDF does not do the parsing > very well.
I've run Etymon with some degree of success in window boxes. To parse word document you can have a look for OpenOffice. You can start OpenOffice to receive a socket connection. From your Java app, you open a connection to OpenOffice (using OpenOffice SDK), send the word document and it will convert it to text. You can also use OpenOffice various other parsing. The url: www.openoffice.org Note: I've never tried OpenOffice under windows, so I'm not sure how it will work, but we are using it here to index our word documents. Regards, -- Victor Hadianto --------------- More are taken in by hope than by cunning. -- Vauvenargues -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
