number of pages returns the incorrect number for some PDFs
----------------------------------------------------------
Key: PDFBOX-944
URL: https://issues.apache.org/jira/browse/PDFBOX-944
Project: PDFBox
Issue Type: Bug
Reporter: Adam Nichols
This is a regression bug which appeared between 1.3.1 and 1.4.0, as the former
returns the correct page count while the latter does not. Unfortunately, the
PDF which demonstrates this problem is confidential, so I can not attach it
here, however I will describe the things which may be causing this problem as
best I can.
The problem does not occur after using the "uncompress" feature of pdftk. The
problem does not occur after using PdfDecompressor from PDFBox. The original
file which was given to me is Linearized. In Adobe Acrobat Standard -> File ->
Properties, it says the Application was "Adobe Photoshop CS4 Windows", the PDF
Producer was "Adobe Photoshop for Windows -- Image Conversion Plug-in" and the
PDF Version is 1.7 (Acrobat 8.x). Fast Web View is set to "No". I suspect
that the problem has to do with the fact that it's Lineraized or the fact that
it uses ObjStm. I don't have enough time to trace through this, so I'm either
going to revert back to PDFBox 1.3.1 or pre-process all the ObjStm objects,
save the uncompressed file, and then process that. The latter is less
efficient, but I think it'll handle more cases. I just wanted to make sure to
open an issue here on JIRA so we can eventually get a proper solution to this
problem.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.