Anita, I've experienced a moderate amount of success using Etymon for PDF parsing. It does consume quite alot of memory for larger PDF documents, but otherwise it's ok. What difficulties are you facing?
For MS Word parsing, The Jakarta POI project is working something out, but in the meanwhile I've managed to search MS Word documents by reading the file and stripping out nonsense characters. It's a hack I think, but if I increase the indexWriter's maxFieldLength to about a million, I can search like 13-15MB word documents with ease. Kelvin ----- Original Message ----- From: "Anita Srinivas" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, April 19, 2002 2:13 PM Subject: PDF / Word document parsers Hi... I have been looking for PDF and Word document parsers. I have tried the contributions page on the Lucene site as suggested by a Lucene User. The PJEtymon does not have a Windows version. The XPDF does not do the parsing very well. Can someone suggest some better Word document or PDF parsers other than the ones I mentioned here, . Thanks Anita Srinivas -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
