Tilman Hausherr created PDFBOX-2163:
---------------------------------------
Summary: inline image with EI in die middle incorrectly parsed
Key: PDFBOX-2163
URL: https://issues.apache.org/jira/browse/PDFBOX-2163
Project: PDFBox
Issue Type: Bug
Components: Parsing
Reporter: Tilman Hausherr
This PDF
http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
has an exception which is because the end of an inline image is improperly
detected. The stream looks like this:
{code}
BI
/W 452
/H 169
/BPC 8
/CS /RGB
/D [0.0 1.0 0.0 1.0 0.0 1.0]
/F [/A85 /Fl]
ID
......................................................
....................................................EI
......................................................
...
....
EI Q
{code}
The inline images are handled in PDFStreamParser. This is tricky, we look for
followup bin data to check that it isn't an EI in the middle, but here it isn't
bin data, but ascii85 stuff. We also can't request that there be a LF before
the EI, because I remember that I had a PDF at work created by a well known
company that doesn't use it.
--
This message was sent by Atlassian JIRA
(v6.2#6252)