[
https://issues.apache.org/jira/browse/PDFBOX-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055801#comment-13055801
]
Abraham Farris commented on PDFBOX-1037:
----------------------------------------
I understand it is broken, I stripped out the content using a text editor
because of the sensitive nature of the data. However if you open up the
"broken document" with adobe reader you see that it gets the page count
correctly of six pages.
When trying to to get a page count with pdfbox it only returns a page count of
one. It seems that the multiple %%EOF is causing an issue. Multiple %%EOF
still is considered a valid pdf.
from what I can gather - reference -
Each file ends with the letters %%EOF, but there can be multiple EOF's in a
single file (this often confuses programs like foremost that search for
footers).
http://www.forensicswiki.org/wiki/PDF
> PDF with multiple %%EOF only parses one page
> --------------------------------------------
>
> Key: PDFBOX-1037
> URL: https://issues.apache.org/jira/browse/PDFBOX-1037
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.5.0
> Environment: Windows XP - Java SE 1.6
> Reporter: Abraham Farris
> Attachments: blankpageproblemmod.pdf
>
>
> Any type of page counts (getDocumentCatalog().getPages().getCount()) only
> return int 1. Doing a simple .load and .save will strip out all pages after
> the first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira