Tilman Hausherr created PDFBOX-2385:
---------------------------------------

             Summary: inline image with EI at the end incorrectly parsed
                 Key: PDFBOX-2385
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2385
             Project: PDFBox
          Issue Type: Bug
          Components: Parsing
    Affects Versions: 1.8.7, 1.8.8, 2.0.0
            Reporter: Tilman Hausherr
            Assignee: Tilman Hausherr
             Fix For: 1.8.8, 2.0.0


I'm having a look at the files from TIKA-1419 where there's a big difference 
with the tokens. And I found another problem with inline images. This time, the 
file is like this:
{code}
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffEI
Q
{code}
Because of the first change in PDFBOX-2163, PDFBox assumes that this is Ascii85 
code but it isn't. From my own tests, deleting the "Ascii85" test [ 
http://svn.apache.org/r1606177 ] and keeping the second change [ 
http://svn.apache.org/r1613645 ] (expecting spaces, 1-3 chars, blanks) works 
fine.

I will have a look at some of the files (those with big token count decrease) 
mentioned in [[email protected]]s csv file over the next few days / weeks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to