[ 
https://issues.apache.org/jira/browse/PDFBOX-2441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180199#comment-14180199
 ] 

Andreas Lehmkühler commented on PDFBOX-2441:
--------------------------------------------

The simple algorithm wasn't the issue here, it still works quite perfect ;-) 
I simply forgot to add the check/repair mechanism to the offsets of object 
streams. With the last commit the self repair is in place an it works. BUT 
there is another issue with at least one of the streams. I got a 
DataFormatException while parsing the object stream 69 0 R. Any ideas?



> Improve XRef self healing mechanism when more than one xref table
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-2441
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2441
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.7, 1.8.8, 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.8.8, 2.0.0
>
>         Attachments: 260105.pdf
>
>
> This is a follow-up issue to PDFBOX-2250:
> {quote}
> the xref repair algorithm simply searches for the nearest offset, which may 
> fail if more than one xref table is present
> ...
> Once we have a sample pdf which can't be parsed with the simple algorithm, we 
> can open a new issue.
> {quote}
> And here's one:
> {code}
> Exception in thread "main" java.io.IOException: Error: Expected a long type 
> at offset 1180, instead got '50/Filter/FlateDecode/DecodeParms'
>         at 
> org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1690)
> {code}
> That file does have more than one xref table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to