[ https://issues.apache.org/jira/browse/PDFBOX-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler updated PDFBOX-4097: --------------------------------------- Fix Version/s: 3.0.0 PDFBox 2.0.10 > Compressed object will lost when brute force search failed to handle > compressed streams > --------------------------------------------------------------------------------------- > > Key: PDFBOX-4097 > URL: https://issues.apache.org/jira/browse/PDFBOX-4097 > Project: PDFBox > Issue Type: Bug > Components: Parsing > Affects Versions: 2.0.8 > Reporter: Cheng Zhong > Assignee: Andreas Lehmkühler > Priority: Major > Fix For: 2.0.10, 3.0.0 PDFBox > > Attachments: 奥美医疗-IPO.pdf > > > Compressed object described in cross-reference streams will lost when brute > force search failed to handle such streams. > The attached PDF has an object 1336, but it had a offset that referenced to > object 1828. The inconsistency led to a brute force search. (Introduced by > *COSParser.checkXrefOffsets*) > During the search (in *bfSearchForObjStreams*), Object stream 1828, 1829, > 1830 failed to decompress due to "corrupted" stream(yes, the *Params* field > was missing in the dictionary or the *Filter* was wrong). Thus, 462 > compressed objects described in cross-reference streams are lost. Since > important objects (the Root, the Pages, etc.) referred to objects in 1828 or > something, all resolved to null (because the corrected XRefOffsets doens't > have them). Further parsing is impossible. > However, when I tried to bypass *checkXrefOffsets*, the PDF shows correctly > without any (noticeable) error. It seemed that object 1336 is not used in the > PDF. > "Corrupted" 1828: > {code:java} > 1828 0 obj > << > /Length 2176 > /Type /ObjStm > /N 200 > /First 2103 > /Filter /FlatDecode > >> > ...{code} > It doesn't work well in *bfSearchForObjStreams* but works in > *parseObjectStream*. > > Would it be nice to have a fallback to preserve compressed stream object key > offsets, when we some error in brute force search? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org