+1 I ran a comparison with 2.0.5-rc1 and (I think) 2.0.4 against ~500k files from our regression corpus.
I haven't had a chance to do much digging, but I wanted to share what I had as soon as I had it. Reports are here: https://github.com/tballison/share/blob/master/pdfbox_comparisons/reports_pdfbox_2.0.5-rc1.zip Lots more "common words". Many fewer exceptions. There may be a regression that is causing 244 new exceptions, but on balance, the improvements are impressive. java.io.IOException: Missing root object specification in trailer. at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2169) at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:222) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:271) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:984) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:922) at ... -----Original Message----- From: Timo Boehme [mailto:[email protected]] Sent: Tuesday, March 14, 2017 9:11 AM To: [email protected] Subject: Re: [VOTE] Release Apache PDFBox 2.0.5 Hi, +1 Maybe we should add the -Dorg.apache.pdfbox.rendering.UsePureJavaCMYKConversion=true setting (introduced with 2.0.4) to the Migration/Getting Started Web-Pages. I had to look through my emails in order to find it and it really makes a difference (at least on some systems) if there are a lot of images on a page - so far we only have the -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider setting documented (which did not help in my case). At least the user may try it out if rendering gets slow on some pages; it may not be a good general setting as it also may slow rendering down a bit on pages with few large images. Best, Timo Am 13.03.2017 um 19:18 schrieb Andreas Lehmkuehler: > Hi, > > a candidate for the PDFBox 2.0.5 release is available at: > > https://dist.apache.org/repos/dist/dev/pdfbox/2.0.5/ > > The release candidate is a zip archive of the sources in: > > http://svn.apache.org/repos/asf/pdfbox/tags/2.0.5/ > > The SHA1 checksum of the archive is > 9521349be859498dfdd0e0f2a5d02b082f097ab1. > > Please vote on releasing this package as Apache PDFBox 2.0.5. > The vote is open for the next 72 hours and passes if a majority of at > least three +1 PDFBox PMC votes are cast. > > [ ] +1 Release this package as Apache PDFBox 2.0.5 > [ ] -1 Do not release this package because... > > > Here is my +1 > > BR > Andreas Lehmkühler > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] For > additional commands, e-mail: [email protected] > -- Timo Boehme OntoChem IT Solutions GmbH Blücherstraße 24 06120 Halle (Saale) Germany phone: +49 345 478 047 4 | fax: +49 345 478 047 1 email: [email protected] | web: www.ontochem.com HRB 21962 Amtsgericht Stendal | USt-IdNr.: DE815563824 managing director : Lutz Weber --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
