[
https://issues.apache.org/jira/browse/PDFBOX-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520164#comment-17520164
]
Andreas Lehmkühler commented on PDFBOX-5413:
--------------------------------------------
I've added another check which ignores unknown objects and don't trigger the
brute force search. In this case the expected object at the given offset is
{{11 0}}. The check added in PDFBOX-5399 detects the trailing {{5}} and assumes
something has to be wrong and triggers the brute force search. In the end the
object is read as {{511 0}} and is missing.
In this case there isn't any definition for an object with the number 511, so
that fixing the obvious malformed pdf by replacing the number {{11 0}} with
{{511 0}} leads to missing content. Let's assume that the offset, the found
object itself is correct and the found digit {{5}} belongs to some garbage of
the previous object.
> Field text missing
> ------------------
>
> Key: PDFBOX-5413
> URL: https://issues.apache.org/jira/browse/PDFBOX-5413
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 2.0.26, 3.0.0 PDFBox
> Reporter: Tilman Hausherr
> Priority: Major
> Labels: regression
> Fix For: 2.0.26, 3.0.0 PDFBox
>
> Attachments: CZIB6B5RY5HQDSEXXWSGUHSAP75CAI7Q.pdf
>
>
> The bottom field on page 2 ("AREA OF CONSIDERATION") is missing.
> This worked in 2.0.25. This is a weird case: incrementally written object 11
> points to 0000102796. However there is a "5" just before the 11.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]