[
https://issues.apache.org/jira/browse/PDFBOX-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler resolved PDFBOX-383.
---------------------------------------
Resolution: Fixed
Fix Version/s: 1.7.0
Assignee: Andreas Lehmkühler
The attached pdfs works fine using the new non sequential parser, see PDFBOX
for details.
> BaseParser incorrectly handling stream, exhibiting IOException
> --------------------------------------------------------------
>
> Key: PDFBOX-383
> URL: https://issues.apache.org/jira/browse/PDFBOX-383
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 0.7.3
> Environment: pdfbox 0.73 with java 5 running on windows platform
> Reporter: Son
> Assignee: Andreas Lehmkühler
> Fix For: 1.7.0
>
> Attachments: BaseParser.java, fail.pdf
>
>
> when loading pdf file containing a file attachment annotation , errors might
> occurs when 2 conditions arise:
> - the Length value for the dictionary of F stream holds an indirect reference
> to a integer value
> - the content of the filtered stream contains the word 'endstream'
> typically this occurs when, in the pdf file, there is a stream description as
> follows:
> 12 0 obj
> << /Length 16 0 R
> /Filter /FlateDecode
> >>
> stream
> {content}
> endstream
> endobj
> ...
> 16 0 obj
> {length}
> endobj
> ....
> and it the {content} (filtered) contains the (filtered) string "endstream".
> (see on line 3700 of the attachment)
> the problem is related to the way stream content is (always) read by method
> readUntilEndStream () that stop on first 'endstream' sequence end.
> a (partial) fix was made, that reads the stream content 3 different ways:
> - if the Length is known (this is a direct object), the {length} bytes are
> read and written to the stream FilteredStream
> - if the Length is unknown and if the filter is FlateFilter, the code
> unfilters the datas (the FlateDecode algorythm allows for not knowing the
> length of encoded data ahead of time) and associates to the stream's
> unfiltered stream
> - otherwise, let current behavior
> Running the modified code on files exhibiting errors has fixed problems that
> was encountered.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira