number of pages returns the incorrect number for some PDFs
----------------------------------------------------------

                 Key: PDFBOX-944
                 URL: https://issues.apache.org/jira/browse/PDFBOX-944
             Project: PDFBox
          Issue Type: Bug
            Reporter: Adam Nichols


This is a regression bug which appeared between 1.3.1 and 1.4.0, as the former 
returns the correct page count while the latter does not.  Unfortunately, the 
PDF which demonstrates this problem is confidential, so I can not attach it 
here, however I will describe the things which may be causing this problem as 
best I can.

The problem does not occur after using the "uncompress" feature of pdftk.  The 
problem does not occur after using PdfDecompressor from PDFBox.  The original 
file which was given to me is Linearized.  In Adobe Acrobat Standard -> File -> 
Properties, it says the Application was "Adobe Photoshop CS4 Windows", the PDF 
Producer was "Adobe Photoshop for Windows -- Image Conversion Plug-in" and the 
PDF Version is 1.7 (Acrobat 8.x).  Fast Web View is set to "No".  I suspect 
that the problem has to do with the fact that it's Lineraized or the fact that 
it uses ObjStm.  I don't have enough time to trace through this, so I'm either 
going to revert back to PDFBox 1.3.1 or pre-process all the ObjStm objects, 
save the uncompressed file, and then process that.  The latter is less 
efficient, but I think it'll handle more cases.  I just wanted to make sure to 
open an issue here on JIRA so we can eventually get a proper solution to this 
problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to