Ben,
many thanks for your complrehensive answer. Unfourtunatly I can not send 
the problem pdfs cause they are the property of company and are of top 
secrecy:)

Regards,
J.




Ben Litchfield <[EMAIL PROTECTED]>
22.10.2004 14:40
Please respond to "Lucene Users List"

 
        To:     Lucene Users List <[EMAIL PROTECTED]>
        cc:     (bcc: Iouli Golovatyi/X/GP/Novartis)
        Subject:        Re: Need advice: what pdf lib to use?
        Category: 




Please post any PDFBox issues you notice on the PDFBox sourceforge bug
list, if possible attach/email any problem PDFs that you encounter.

There are some efforts underway to improve the speed of PDFBox, you can
monitor the progress at
http://sourceforge.net/tracker/index.php?func=detail&aid=1046300&group_id=78314&atid=552832

As for other suggestions, I know some people have utilized xpdf(open
source but non Java) to extract the text.

For other Java solutions
PDFTextStream(commercial) - "Fastest PDF-to-Text Solution for Java"
http://snowtide.com/home/PDFTextStream/

Etymon PJ (GPL)
http://www.etymon.com/

Ben
http://www.pdfbox.org



On Fri, 22 Oct 2004 [EMAIL PROTECTED] wrote:

> Hello all,
>
> I need a piece of advice/experience..
>
> What pdf parser (written in java) u'd recommend?
>
> I played now with PDFBox-0.6.7a and would not say I was satisfied too 
much
> with it
>
> On certain pdf's (not well formated but anyway readable with acrobate) 
it
> run into dead loop (this I could fix in code),
> and on one file it produced "out of memory error" and killed jvm:( (this
> problem I could not identify yet)
>
> After all the performance was not too great as well: it took c. 19 h. to
> index 13000 files (c. 3.5Gb)
>
> Regards,
> J.
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to