Andrew Olsen created PDFBOX-2016:
------------------------------------

             Summary: Stream parsing still incorrect if length value is wrong
                 Key: PDFBOX-2016
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2016
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.8.4, 1.6.0
            Reporter: Andrew Olsen


>From issue PDFBOX-1333 - "In 1.7.0 stream parsing in BaseParser was optimized 
>to use length value if available. The advantage is faster parsing and 
>independence of 'endstream' bytes sequences in stream. However the 
>disadvantage is that streams with wrong length values cannot be parsed 
>anymore" - etc. 

This issue was marked as fixed now that COSStreams can once again be parsed by 
reading all the way to 'endstream'. However, the resulting COSStream object 
still contains the expected length, not the true length. When parsing the 
COSStream with a PDFStreamParser, the call to COSStream#getUnfilteredStream 
uses getLength() instead of getLengthWritten to limit the amount of data that 
can be read. This can truncate the stream and means that incorrect length 
values still lead to missing data, and so limits the usefulness of the last 
fix. Changing the call to getLengthWritten should solve the problem.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to