[ 
https://issues.apache.org/jira/browse/PDFBOX-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419077#comment-17419077
 ] 

Michael Klink commented on PDFBOX-5283:
---------------------------------------

{quote}True, the PDF is in some ways broken. But currently the second reference 
is read, which should be object 9 in the table.{quote}

Yes. And what now?

The information in your PDF is contradictory. So different PDF processors are 
likely to parse the PDF differently. GIGO.

Nonetheless, you might be in luck, PDFBox maintainers have a tendency to try 
and handle broken PDFs in a similar way as Adobe software does.

IMO such PDFs should be rejected, repairs under the hood simply are attack 
vectors for forgery.

> No Content - xRef / Obj Parsing
> -------------------------------
>
>                 Key: PDFBOX-5283
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5283
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.24, 3.0.0 PDFBox
>            Reporter: Oliver Schmidtmer
>            Priority: Major
>         Attachments: Lieferschein_110300.pdf
>
>
> There seems to be an issue with xRef / object reading when parsing the 
> attached pdf.
> The PDF itself has for example two objects with the ref "8 0 R":
> One at position 1967 with a "/Content" entry.
> One at position 7782 without a "/Content" entry.
> Both are referenced in the XRef Table, so there seems to be something off. 
> Probably Acrobat, etc. are using the first object, while PDFBox is using the 
> second one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to