There are different Parsers available - every Parser has other advantages
and disadvantages.
I use a combination of the PDFBox  http://www.pdfbox.org/ and Etymon PJ
http://www.etymon.com/pjc/, cause their APIs are very simple. Both of them
parse PDF in a format of their own an provide interfaces to get the PDF
Documents contents.

Other developers on this list prefer JPedal http://www.jpedal.org/ which
parses PDF into XML an provide a XML Tree with the PDF Documents contentsest, but the 
Documentation isn´t very detailed.

Micha

-----Ursprüngliche Nachricht-----
Von: Thomas Chacko [mailto:[EMAIL PROTECTED]]
Gesendet: Freitag, 22. November 2002 15:26
An: Lucene Users List
Betreff: PDF parser


Whats the best parser available to extarct text from PDF documents.
Expecting a reply ASAP

Thanks in advance
Thomas Chacko


--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to