[jira] [Commented] (PDFBOX-1037) PDF with multiple %%EOF only parses one page

Adam Nichols (JIRA) Tue, 28 Jun 2011 11:10:43 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056686#comment-13056686
 ]


Adam Nichols commented on PDFBOX-1037:
--------------------------------------

If you do not use the "force" option and it does not throw an exception, then 
it probably parsed everything correctly, but there's no way to know for sure.  
PDFBOX-911 is a similar issue and Andreas and I agreed that "we need a 
conforming parser" to really solve the issue properly.

There was another very recent thread (PDFBOX-1016) which was related to the way 
the xref reads in object.  A PDF can have two objects with the exact same 
object number and revision (when there are incremental updates).  Which one is 
actually used is dictated by the XRef tables and the thread was about how the 
current code does not parse the XRef tables in the correct order.  I think it 
may resolve the issue that you are facing.  The code that Thomas referenced is 
in the resolveConflicts() method, which is the current way of dealing with 
multiple objects with the same object number and revision.

So, the short answer is "no, not with 100% accuracy with the current codebase, 
but try 1.6.0 when it comes out in a few hours and see if the patch for 
PDFBOX-1016 helps."

> PDF with multiple %%EOF only parses one page
> --------------------------------------------
>
>                 Key: PDFBOX-1037
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1037
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.5.0
>         Environment: Windows XP - Java SE 1.6
>            Reporter: Abraham Farris
>         Attachments: blankpageproblemmod.pdf, blankpageproblemmod.png
>
>
> Any type of page counts (getDocumentCatalog().getPages().getCount()) only 
> return int 1.  Doing a simple .load and .save will strip out all pages after 
> the first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PDFBOX-1037) PDF with multiple %%EOF only parses one page

Reply via email to