[ 
https://issues.apache.org/jira/browse/PDFBOX-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419586#comment-17419586
 ] 

Andreas Lehmkühler commented on PDFBOX-5283:
--------------------------------------------

Your patch might work in your case but as Tilman already mentioned it won't in 
other cases so that we have to decide which strategy to use in such cases.

The brute force parse comes in place if the pdf is somehow corrupt. It simply 
collects all valid objects starting from the beginning. If there are multiple 
objects using the same key most likely the latest occurrence is the correct one 
as those are supposed to update earlier versions of an object. Saying that I'm 
afraid the current implementation is the better choice.

Theoretically there are maybe other ways to repair such malformed pdfs, but 
those would be more complex and I'm not sure if it would be worth to implement 
such an algorithm to repair just one or maybe a couple of pdfs.



> No Content - xRef / Obj Parsing
> -------------------------------
>
>                 Key: PDFBOX-5283
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5283
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.24, 3.0.0 PDFBox
>            Reporter: Oliver Schmidtmer
>            Priority: Major
>         Attachments: Lieferschein_110300.pdf
>
>
> There seems to be an issue with xRef / object reading when parsing the 
> attached pdf.
> The PDF itself has for example two objects with the ref "8 0 R":
> One at position 1967 with a "/Content" entry.
> One at position 7782 without a "/Content" entry.
> Both are referenced in the XRef Table, so there seems to be something off. 
> Probably Acrobat, etc. are using the first object, while PDFBox is using the 
> second one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to