[ 
https://issues.apache.org/jira/browse/PDFBOX-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tilman Hausherr updated PDFBOX-2163:
------------------------------------

    Attachment: PDFBOX-2163-029016.pdf

The attached file has this:
{code}
EI<NL>DB'Z[<TAB>8F 
{code}
so the part after EI was considered as "not binary". So I have improved the 
code once again, requiring that the "not binary" part (which I have set to 10 
bytes now) must have 1-3 non space characters after the end of EI and space 
characters. This is probably still not the end of it, the next step would be to 
require that the non-space character sequence be a valid PDF operator. This was 
done in rev 1613645 for the trunk and rev 1613646 for the 1.8 branch.

> inline image with EI in the middle incorrectly parsed
> -----------------------------------------------------
>
>                 Key: PDFBOX-2163
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2163
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.6, 1.8.7, 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>              Labels: inline
>             Fix For: 1.8.7, 2.0.0
>
>         Attachments: PDFBOX-2163-029016.pdf
>
>
> This PDF
> http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
> has an exception because the end of an inline image is improperly detected. 
> The stream looks like this:
> {code}
> BI
>   /W 452
>   /H 169
>   /BPC 8
>   /CS /RGB
>   /D [0.0 1.0 0.0 1.0 0.0 1.0]
>   /F [/A85 /Fl]
> ID
> ......................................................
> ....................................................EI
> ......................................................
> ...
> ....
> EI Q
> {code}
> The inline images are handled in PDFStreamParser. This is tricky, we look for 
> followup bin data to check that it isn't an EI in the middle, but here it 
> isn't bin data, but ascii85 stuff. We also can't request that there be a LF 
> before the EI, because I remember that I had a PDF at work created by a well 
> known company that doesn't use it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to