There are different Parsers available - every Parser has other advantages and disadvantages. I use a combination of the PDFBox http://www.pdfbox.org/ and Etymon PJ http://www.etymon.com/pjc/, cause their APIs are very simple. Both of them parse PDF in a format of their own an provide interfaces to get the PDF Documents contents.
Other developers on this list prefer JPedal http://www.jpedal.org/ which parses PDF into XML an provide a XML Tree with the PDF Documents contentsest, but the Documentation isn´t very detailed. Micha -----Ursprüngliche Nachricht----- Von: Thomas Chacko [mailto:[EMAIL PROTECTED]] Gesendet: Freitag, 22. November 2002 15:26 An: Lucene Users List Betreff: PDF parser Whats the best parser available to extarct text from PDF documents. Expecting a reply ASAP Thanks in advance Thomas Chacko -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>