[ 
https://issues.apache.org/jira/browse/PDFBOX-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055801#comment-13055801
 ] 

Abraham Farris commented on PDFBOX-1037:
----------------------------------------

I understand it is broken, I stripped out the content using a text editor 
because of the sensitive nature of the data.  However if you open up the 
"broken document" with adobe reader you see that it gets the page count 
correctly of six pages.

When trying to to get a page count with pdfbox it only returns a page count of 
one.  It seems that the multiple %%EOF is causing an issue.  Multiple %%EOF 
still is considered a valid pdf. 


from what I can gather - reference - 

Each file ends with the letters %%EOF, but there can be multiple EOF's in a 
single file (this often confuses programs like foremost that search for 
footers).

http://www.forensicswiki.org/wiki/PDF


> PDF with multiple %%EOF only parses one page
> --------------------------------------------
>
>                 Key: PDFBOX-1037
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1037
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.5.0
>         Environment: Windows XP - Java SE 1.6
>            Reporter: Abraham Farris
>         Attachments: blankpageproblemmod.pdf
>
>
> Any type of page counts (getDocumentCatalog().getPages().getCount()) only 
> return int 1.  Doing a simple .load and .save will strip out all pages after 
> the first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to