[
https://issues.apache.org/jira/browse/PDFBOX-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056686#comment-13056686
]
Adam Nichols commented on PDFBOX-1037:
--------------------------------------
If you do not use the "force" option and it does not throw an exception, then
it probably parsed everything correctly, but there's no way to know for sure.
PDFBOX-911 is a similar issue and Andreas and I agreed that "we need a
conforming parser" to really solve the issue properly.
There was another very recent thread (PDFBOX-1016) which was related to the way
the xref reads in object. A PDF can have two objects with the exact same
object number and revision (when there are incremental updates). Which one is
actually used is dictated by the XRef tables and the thread was about how the
current code does not parse the XRef tables in the correct order. I think it
may resolve the issue that you are facing. The code that Thomas referenced is
in the resolveConflicts() method, which is the current way of dealing with
multiple objects with the same object number and revision.
So, the short answer is "no, not with 100% accuracy with the current codebase,
but try 1.6.0 when it comes out in a few hours and see if the patch for
PDFBOX-1016 helps."
> PDF with multiple %%EOF only parses one page
> --------------------------------------------
>
> Key: PDFBOX-1037
> URL: https://issues.apache.org/jira/browse/PDFBOX-1037
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.5.0
> Environment: Windows XP - Java SE 1.6
> Reporter: Abraham Farris
> Attachments: blankpageproblemmod.pdf, blankpageproblemmod.png
>
>
> Any type of page counts (getDocumentCatalog().getPages().getCount()) only
> return int 1. Doing a simple .load and .save will strip out all pages after
> the first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira