[ 
https://issues.apache.org/jira/browse/PDFBOX-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394166#comment-16394166
 ] 

Andreas Lehmkühler commented on PDFBOX-4097:
--------------------------------------------

This is not true for all compressed streams. The given pdf is encrypted and so 
are the compressed streams. The brute force mechanism can't handle such streams.

> Compressed object will lost when brute force search failed to handle 
> compressed streams
> ---------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4097
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4097
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.8
>            Reporter: Cheng Zhong
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>         Attachments: 奥美医疗-IPO.pdf
>
>
> Compressed object described in cross-reference streams will lost when brute 
> force search failed to handle such streams.
> The attached PDF has an object 1336, but it had a offset that referenced to 
> object 1828. The inconsistency led to a brute force search. (Introduced by 
> *COSParser.checkXrefOffsets*)
> During the search (in *bfSearchForObjStreams*), Object stream 1828, 1829, 
> 1830 failed to decompress due to "corrupted" stream(yes, the *Params* field 
> was missing in the dictionary or the *Filter* was wrong). Thus, 462 
> compressed objects described in cross-reference streams are lost. Since 
> important objects (the Root, the Pages, etc.) referred to objects in 1828 or 
> something, all resolved to null (because the corrected XRefOffsets doens't 
> have them). Further parsing is impossible.
> However, when I tried to bypass *checkXrefOffsets*, the PDF shows correctly 
> without any (noticeable) error. It seemed that object 1336 is not used in the 
> PDF.
> "Corrupted" 1828:
> {code:java}
> 1828 0 obj
> <<
> /Length 2176
> /Type /ObjStm
> /N 200
> /First 2103
> /Filter /FlatDecode
> >>
> ...{code}
> It doesn't work well in *bfSearchForObjStreams* but works in 
> *parseObjectStream*.
>  
> Would it be nice to have a fallback to preserve compressed stream object key 
> offsets, when we some error in brute force search?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

Reply via email to