[
https://issues.apache.org/jira/browse/PDFBOX-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093152#comment-14093152
]
Thomas Chojecki commented on PDFBOX-2250:
-----------------------------------------
[~lehmi]
"All fixes and improvements are targeting the non-sequential parser and I won't
port those changes to the old parser."
The old parser already has this feature or similar one as I remember. This was
needed as fix for a third party lib that creates documents that have a miss
matched offset by 2 or 3 bytes. You can find it in the PDFParser class line 923
(resolveConflicts).
https://github.com/apache/pdfbox/blob/trunk/pdfbox/src/main/java/org/apache/pdfbox/pdfparser/PDFParser.java#L923
I don't have read the whole coversation, but you wrote something of 200 bytes
self healing range. This can cause problems with pdfs that are broken and
include pdf documents as file attachment. The flatdecode algorithm sometimes
does not compress each block, so it will leave some plaintext pdf blocks whick
can contain parts like "endstream" or "endobj". In this case it can happen that
the self healing algorithm runs into such an uncompressed block and fail
reading the object.
I hope you understand what I mean :-)
PS: some offtopic things. I think the signature implementation only work with
the old parser. So maybe someone can post this info on the website if the
default parser implementation change.
> Improve XRef self healing mechanism
> -----------------------------------
>
> Key: PDFBOX-2250
> URL: https://issues.apache.org/jira/browse/PDFBOX-2250
> Project: PDFBox
> Issue Type: Improvement
> Components: Parsing
> Affects Versions: 1.8.6, 1.8.7, 2.0.0
> Reporter: Andreas Lehmkühler
> Assignee: Andreas Lehmkühler
>
> PDFBOX-1769 introduced a "self healing" mechanism to repair corrupt XRef
> offsets. But that one was just a starter and there remain a lot of issues to
> be solved. I'm planing to solve at least some of them.
> All fixes and improvements are targeting the non-sequential parser and I
> won't port those changes to the old parser.
--
This message was sent by Atlassian JIRA
(v6.2#6252)