Kelvin, Thanks for your quick reply. I think it is built for Linux/Unix platform.. I am working on Windows platform.
Anita ----- Original Message ----- From: "Kelvin Tan" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, April 19, 2002 10:25 AM Subject: Re: PDF / Word document parsers > Anita, > > I've experienced a moderate amount of success using Etymon for PDF parsing. > It does consume quite alot of memory for larger PDF documents, but otherwise > it's ok. What difficulties are you facing? > > For MS Word parsing, The Jakarta POI project is working something out, but > in the meanwhile I've managed to search MS Word documents by reading the > file and stripping out nonsense characters. It's a hack I think, but if I > increase the indexWriter's maxFieldLength to about a million, I can search > like 13-15MB word documents with ease. > > Kelvin > ----- Original Message ----- > From: "Anita Srinivas" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Friday, April 19, 2002 2:13 PM > Subject: PDF / Word document parsers > > > Hi... > > I have been looking for PDF and Word document parsers. I have tried the > contributions page on the Lucene site as suggested by a Lucene User. The > PJEtymon does not have a Windows version. The XPDF does not do the parsing > very well. > > Can someone suggest some better Word document or PDF parsers other than the > ones I mentioned here, . > > Thanks > > Anita Srinivas > > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
