[ https://issues.apache.org/jira/browse/PDFBOX-1037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056613#comment-13056613 ]
Adam Nichols commented on PDFBOX-1037: -------------------------------------- I've done a litte work on the parser, and even the %%EOF section. As far as I can tell, there will not be ay negative side effects from forcing parsing if the PDF conforms to the specification. For non-conforming documents, results may vary. It will do everything it possibly can to avoid throwing an exception, but since it will be skipping some object, it will skip page objects if they are corrupt/non-conforming. This could cause problems with the page count, extracting text, etc. Essentially it's just a question of: do you want to reject PDFs which might not be processed correctly and let the user know, or do your best at processing them and not tell your user when there may have been problems? If it's a fully automated system, then I'm guessing you would prefer the latter. There's also the option of a combination. Try to parse normally, if that doesn't work, inform the user of this and give them the option to "do it anyway" and encourage them to check the results if they choose this option. Then just try to force parse it and if that fails, just tell them you are sorry (and save the PDF so you, or we, can check out what the issue is and maybe enhance the parser). > PDF with multiple %%EOF only parses one page > -------------------------------------------- > > Key: PDFBOX-1037 > URL: https://issues.apache.org/jira/browse/PDFBOX-1037 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 1.5.0 > Environment: Windows XP - Java SE 1.6 > Reporter: Abraham Farris > Attachments: blankpageproblemmod.pdf, blankpageproblemmod.png > > > Any type of page counts (getDocumentCatalog().getPages().getCount()) only > return int 1. Doing a simple .load and .save will strip out all pages after > the first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira