[ 
https://issues.apache.org/jira/browse/PDFBOX-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2163:
------------------------------------

    Description: 
This PDF
http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
has an exception because the end of an inline image is improperly detected. The 
stream looks like this:
{code}
BI
  /W 452
  /H 169
  /BPC 8
  /CS /RGB
  /D [0.0 1.0 0.0 1.0 0.0 1.0]
  /F [/A85 /Fl]
ID
......................................................
....................................................EI
......................................................
...
....
EI Q
{code}

The inline images are handled in PDFStreamParser. This is tricky, we look for 
followup bin data to check that it isn't an EI in the middle, but here it isn't 
bin data, but ascii85 stuff. We also can't request that there be a LF before 
the EI, because I remember that I had a PDF at work created by a well known 
company that doesn't use it.

  was:
This PDF
http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
has an exception which is because the end of an inline image is improperly 
detected. The stream looks like this:
{code}
BI
  /W 452
  /H 169
  /BPC 8
  /CS /RGB
  /D [0.0 1.0 0.0 1.0 0.0 1.0]
  /F [/A85 /Fl]
ID
......................................................
....................................................EI
......................................................
...
....
EI Q
{code}

The inline images are handled in PDFStreamParser. This is tricky, we look for 
followup bin data to check that it isn't an EI in the middle, but here it isn't 
bin data, but ascii85 stuff. We also can't request that there be a LF before 
the EI, because I remember that I had a PDF at work created by a well known 
company that doesn't use it.


> inline image with EI in the middle incorrectly parsed
> -----------------------------------------------------
>
>                 Key: PDFBOX-2163
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2163
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>            Reporter: Tilman Hausherr
>
> This PDF
> http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
> has an exception because the end of an inline image is improperly detected. 
> The stream looks like this:
> {code}
> BI
>   /W 452
>   /H 169
>   /BPC 8
>   /CS /RGB
>   /D [0.0 1.0 0.0 1.0 0.0 1.0]
>   /F [/A85 /Fl]
> ID
> ......................................................
> ....................................................EI
> ......................................................
> ...
> ....
> EI Q
> {code}
> The inline images are handled in PDFStreamParser. This is tricky, we look for 
> followup bin data to check that it isn't an EI in the middle, but here it 
> isn't bin data, but ascii85 stuff. We also can't request that there be a LF 
> before the EI, because I remember that I had a PDF at work created by a well 
> known company that doesn't use it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to