[
https://issues.apache.org/jira/browse/PDFBOX-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr updated PDFBOX-2163:
------------------------------------
Attachment: PDFBOX-2163-029016.pdf
The attached file has this:
{code}
EI<NL>DB'Z[<TAB>8F
{code}
so the part after EI was considered as "not binary". So I have improved the
code once again, requiring that the "not binary" part (which I have set to 10
bytes now) must have 1-3 non space characters after the end of EI and space
characters. This is probably still not the end of it, the next step would be to
require that the non-space character sequence be a valid PDF operator. This was
done in rev 1613645 for the trunk and rev 1613646 for the 1.8 branch.
> inline image with EI in the middle incorrectly parsed
> -----------------------------------------------------
>
> Key: PDFBOX-2163
> URL: https://issues.apache.org/jira/browse/PDFBOX-2163
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.8.6, 1.8.7, 2.0.0
> Reporter: Tilman Hausherr
> Assignee: Tilman Hausherr
> Labels: inline
> Fix For: 1.8.7, 2.0.0
>
> Attachments: PDFBOX-2163-029016.pdf
>
>
> This PDF
> http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
> has an exception because the end of an inline image is improperly detected.
> The stream looks like this:
> {code}
> BI
> /W 452
> /H 169
> /BPC 8
> /CS /RGB
> /D [0.0 1.0 0.0 1.0 0.0 1.0]
> /F [/A85 /Fl]
> ID
> ......................................................
> ....................................................EI
> ......................................................
> ...
> ....
> EI Q
> {code}
> The inline images are handled in PDFStreamParser. This is tricky, we look for
> followup bin data to check that it isn't an EI in the middle, but here it
> isn't bin data, but ascii85 stuff. We also can't request that there be a LF
> before the EI, because I remember that I had a PDF at work created by a well
> known company that doesn't use it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)