[
https://issues.apache.org/jira/browse/PDFBOX-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tilman Hausherr resolved PDFBOX-2385.
-------------------------------------
Resolution: Fixed
> inline image with EI at the end incorrectly parsed
> --------------------------------------------------
>
> Key: PDFBOX-2385
> URL: https://issues.apache.org/jira/browse/PDFBOX-2385
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Affects Versions: 1.8.7, 1.8.8, 2.0.0
> Reporter: Tilman Hausherr
> Assignee: Tilman Hausherr
> Labels: regression
> Fix For: 1.8.8, 2.0.0
>
> Attachments: PDFBOX-2385-146515.pdf, PDFBOX-2385-539663.pdf,
> PDFBOX-2385-862497.pdf, PDFBOX-2385-893083.pdf
>
>
> I'm having a look at the files from TIKA-1419 where there's a big decrease in
> the token count. And I found another problem with inline images. This time,
> the file is like this:
> {code}
> ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffEI
> Q
> {code}
> Because of the first change in PDFBOX-2163, PDFBox assumes that this is
> Ascii85 code but it isn't. From my own tests, deleting the "Ascii85" test [
> http://svn.apache.org/r1606177 ] and keeping the second change [
> http://svn.apache.org/r1613645 ] (expecting spaces, 1-3 chars, blanks) works
> fine.
> I will have a look at some of the files (those with big token count decrease)
> mentioned in [[email protected]]s csv file over the next few days / weeks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)