I just took a look at JPedal and I'm very impressed. Extracted some text as XML data no problem.
Amazingly also creates thumbnails of the PDF file which is something I've needed but couldn't find...:) Regards, Kelvin On Wed, 10 Jul 2002 09:59:32 +0200, Jose Galiana wrote: >Hi, > >I?ve used JPedal ( www.jpedal.org ). I?s distibuited under LGPL >license and >extract raw text, among other uses. > >I wrote code to extract text using Etymon PJ library, with PDF?s >withs >propietary fonts, I needed to create a cross tabla to translate >Unicode to >ASCII because Distiller inserts only a subset of Unicode tabla for >each >propietary font. > >JPedal has not problem with thats fonts and extract all text in XML, >suitalble for use with Lucene. > > > >-----Mensaje original----- >De: Ben Litchfield [mailto:[EMAIL PROTECTED]] >Enviado el: martes, 09 de julio de 2002 16:48 >Para: [EMAIL PROTECTED] >Asunto: PDF Text Stripper > > >Hi, > >I have written a PDF library that can be used to strip text from PDF >documents. It is released under LGPL so have fun. > >There is one class which can be used to easily index PDF documents. >pdfparser.searchengine.lucene.LucenePDFDocument has a getDocument >method which will take a PDF file and return a Lucene Document which >you >can add to an index. > >If you would like to see the quality of the text extraction you can >run >pdfparser.Main from the command line which will take a PDF document >and >write a txt file. > >I am looking for any input that you might have. Please mail me if >you >have any bugs or feature requests. > >The library can be retrieved from >http://www.csh.rit.edu/~ben/projects/pdfparser/ > >-Ben Litchfield > > >-- >To unsubscribe, e-mail: ><mailto:[EMAIL PROTECTED]> >For additional commands, e-mail: ><mailto:[EMAIL PROTECTED]> > > > >-- >To unsubscribe, e-mail: <mailto:lucene-user- >[EMAIL PROTECTED]> >For additional commands, e-mail: <mailto:lucene-user- >[EMAIL PROTECTED]> > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>