[jira] [Created] (PDFBOX-2247) Regression in text extraction between 1.8.5 and 1.8.6

Tim Allison (JIRA) Mon, 28 Jul 2014 06:15:11 -0700

Tim Allison created PDFBOX-2247:
-----------------------------------

             Summary: Regression in text extraction between 1.8.5 and 1.8.6
                 Key: PDFBOX-2247
                 URL: https://issues.apache.org/jira/browse/PDFBOX-2247
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 1.8.6
            Reporter: Tim Allison
            Priority: Minor



Looks like a character mapping issue crept in some time between 1.8.5 and 1.8.6 
on this 
[file|http://digitalcorpora.org/corp/nps/files/govdocs1/701/701542.pdf]? 

With both seq and NonSeq parsers, the correct text was extracted via 
ExtractText in 1.8.5.  In 1.8.6, java -jar pdfbox-app-1.8.6.jar ExtractText 
yields text starting with: {noformat}7>PFLK>I 9>NH ;BNRF@B
=%;% .BM>NPJBKP LC PEB 3KPBNFLN
9>@FCF@ -L>OP ;@FBK@B >KA 5B>NKFKD -BKPBN
:BOB>N@E 9NLGB@P ;QJJ>NT .B@BJ?BN (&&*
"&++&,-+Æ$( #&+-&%+$-& !).&)-*+Æ&,{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (PDFBOX-2247) Regression in text extraction between 1.8.5 and 1.8.6

Reply via email to