[ https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093152#comment-14093152 ]
Thomas Chojecki commented on PDFBOX-2250: ----------------------------------------- [~lehmi] "All fixes and improvements are targeting the non-sequential parser and I won't port those changes to the old parser." The old parser already has this feature or similar one as I remember. This was needed as fix for a third party lib that creates documents that have a miss matched offset by 2 or 3 bytes. You can find it in the PDFParser class line 923 (resolveConflicts). https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L923 I don't have read the whole coversation, but you wrote something of 200 bytes self healing range. This can cause problems with pdfs that are broken and include pdf documents as file attachment. The flatdecode algorithm sometimes does not compress each block, so it will leave some plaintext pdf blocks whick can contain parts like "endstream" or "endobj". In this case it can happen that the self healing algorithm runs into such an uncompressed block and fail reading the object. I hope you understand what I mean :-) PS: some offtopic things. I think the signature implementation only work with the old parser. So maybe someone can post this info on the website if the default parser implementation change. > Improve XRef self healing mechanism > ----------------------------------- > > Key: PDFBOX-2250 > URL: https://issues.apache.org/jira/browse/PDFBOX-2250 > Project: PDFBox > Issue Type: Improvement > Components: Parsing > Affects Versions: 1.8.6, 1.8.7, 2.0.0 > Reporter: Andreas Lehmkühler > Assignee: Andreas Lehmkühler > > PDFBOX-1769 introduced a "self healing" mechanism to repair corrupt XRef > offsets. But that one was just a starter and there remain a lot of issues to > be solved. I'm planing to solve at least some of them. > All fixes and improvements are targeting the non-sequential parser and I > won't port those changes to the old parser. -- This message was sent by Atlassian JIRA (v6.2#6252)