Cheng Zhong created PDFBOX-4097:
-----------------------------------
Summary: Compressed object will lost when brute force search
failed to handle compressed streams
Key: PDFBOX-4097
URL: https://issues.apache.org/jira/browse/PDFBOX-4097
Project: PDFBox
Issue Type: Bug
Components: Parsing
Affects Versions: 2.0.8
Reporter: Cheng Zhong
Attachments: 奥美医疗-IPO.pdf
Compressed object described in cross-reference streams will lost when brute
force search failed to handle such streams.
The attached PDF has an object 1336, but it had a offset that referenced to
object 1828. The inconsistency led to a brute force search. (Introduced by
*COSParser.checkXrefOffsets*)
During the search (in *bfSearchForObjStreams*), Object stream 1828, 1829, 1830
failed to decompress due to "corrupted" stream(yes, the *Params* field was
missing in the dictionary or the *Filter* was wrong). Thus, 462 compressed
objects described in cross-reference streams are lost. Since important objects
(the Root, the Pages, etc.) referred to objects in 1828 or something, all
resolved to null (because the corrected XRefOffsets doens't have them). Further
parsing is impossible.
However, when I tried to bypass *checkXrefOffsets*, the PDF shows correctly
without any (noticeable) error. It seemed that object 1336 is not used in the
PDF.
"Corrupted" 1828:
{code:java}
1828 0 obj
<<
/Length 2176
/Type /ObjStm
/N 200
/First 2103
/Filter /FlatDecode
>>
...{code}
It doesn't work well in *bfSearchForObjStreams* but works in
*parseObjectStream*.
Would it be nice to have a fallback to preserve compressed stream object key
offsets, when we some error in brute force search?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]