[ https://issues.apache.org/jira/browse/PDFBOX-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419077#comment-17419077 ]
Michael Klink commented on PDFBOX-5283: --------------------------------------- {quote}True, the PDF is in some ways broken. But currently the second reference is read, which should be object 9 in the table.{quote} Yes. And what now? The information in your PDF is contradictory. So different PDF processors are likely to parse the PDF differently. GIGO. Nonetheless, you might be in luck, PDFBox maintainers have a tendency to try and handle broken PDFs in a similar way as Adobe software does. IMO such PDFs should be rejected, repairs under the hood simply are attack vectors for forgery. > No Content - xRef / Obj Parsing > ------------------------------- > > Key: PDFBOX-5283 > URL: https://issues.apache.org/jira/browse/PDFBOX-5283 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.24, 3.0.0 PDFBox > Reporter: Oliver Schmidtmer > Priority: Major > Attachments: Lieferschein_110300.pdf > > > There seems to be an issue with xRef / object reading when parsing the > attached pdf. > The PDF itself has for example two objects with the ref "8 0 R": > One at position 1967 with a "/Content" entry. > One at position 7782 without a "/Content" entry. > Both are referenced in the XRef Table, so there seems to be something off. > Probably Acrobat, etc. are using the first object, while PDFBox is using the > second one. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org