[
https://issues.apache.org/jira/browse/PDFBOX-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736752#comment-17736752
]
Andreas Lehmkühler commented on PDFBOX-5623:
--------------------------------------------
The Xref-stream parser doesn't rely on sorted index values anymore. I didn't
used the applied patch as it is buggy as [~mkl] already pointed out. BUT,
neither my change nor the provided patch really fixes something. My patch
doesn't has any effect on the result. The provided patch mixes up the xref
values as it sorts the index values but not the object numbers which triggers
the brute force parser. The latter is somehow able to repair the pdf.
[~mkl] already analysed the pdf and it looks like a mess due to the latest
changes. I've to look what exactly the brute force parser is doing to "repair"
the pdf.
> Signature Image not Rendered starting with PDFBox 2.0.23 + patch provided
> -------------------------------------------------------------------------
>
> Key: PDFBOX-5623
> URL: https://issues.apache.org/jira/browse/PDFBOX-5623
> Project: PDFBox
> Issue Type: Bug
> Components: Rendering
> Affects Versions: 2.0.23, 2.0.24, 2.0.25, 2.0.26, 2.0.27, 2.0.28
> Environment: Java 8, Windows 10 and Ubuntu 22
> Reporter: Lionel Fradin
> Assignee: Andreas Lehmkühler
> Priority: Major
> Attachments:
> Fixing_the_problem_when_the_COSArray_is_not_sorted_in_increasing_order_.patch,
> PDFBOX-issue-rendering-signature.pdf, pdfbox22-page9-br.jpg,
> pdfbox23-page9-br.jpg
>
>
> We have an online service where our customers post their PDF files so that we
> can render them.
> One of our customer noticed recently that one of its signed document did not
> show the image associated with the signature. They gave me the right to share
> this document and you will find it attached
> ([^PDFBOX-issue-rendering-signature.pdf]).
> The problem is in the last page, page 9. The issue can easily be reproduced
> using pdfbox-app-2.0*.jar PDFToImage.
> Result with pdfbox 2.0.22 is:
> !pdfbox22-page9-br.jpg!
> Result with pdfbox 2.0.23 or later is:
> !pdfbox23-page9-br.jpg!
> The regression was introduced with commit (seen in git)
> [f34a33824c4363b9b683245cb582328dc92b79ca|https://github.com/apache/pdfbox/commit/f34a33824c4363b9b683245cb582328dc92b79ca],
> dated 2021-03-02 07:12:11+0000. The associated ticket was PDFBOX-5112.
> The issue is in PDFXrefStreamParser's ObjectNumbers constructor, as it
> assumes that the COSInteger objects in the COSArray are necessarily sorted.
> In the case of the attached pdf, they are not, and this causes the parser to
> abort browsing the array too soon.
> I have a patch for that on branch 2.0:
> [^Fixing_the_problem_when_the_COSArray_is_not_sorted_in_increasing_order_.patch]
> With this patch the image is created successfully. However, there are warning
> that appear, that did not exist in version 2.0.22:
> {noformat}
> Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey
> WARNING: found wrong object number. expected [6789] found [6791]
> Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey
> WARNING: found wrong object number. expected [6790] found [5327]
> Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey
> WARNING: found wrong object number. expected [6791] found [6485]
> Jun 16, 2023 5:18:29 PM org.apache.pdfbox.pdfparser.COSParser findObjectKey
> WARNING: found wrong object number. expected [6485] found [6789]
> {noformat}
> There may be additional fixes to be made in order to fully support this PDF.
> I did not have time to investigate, and also my knowledge of the codebase if
> fairly limited. So help would be appreciated here.
> Thanks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]