[jira] Resolved: (PDFBOX-291) Text Extraction strips 1 char when extracting a twin pair

JIRA Sun, 03 Oct 2010 05:04:58 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andreas Lehmkühler resolved PDFBOX-291.
---------------------------------------

    Fix Version/s: 1.3.0
       Resolution: Fixed

Works fine with the current trunk version (rev. 1003396). I attached the text 
extraction result.

> Text Extraction strips 1 char when extracting a twin pair
> ---------------------------------------------------------
>
>                 Key: PDFBOX-291
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-291
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>            Priority: Minor
>             Fix For: 1.3.0
>
>         Attachments: PDFBOX291-doublesTest.pdf, PDFBOX291-doublesTest.txt
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1761570
> Originally submitted by nalundgaard on 2007-07-26 13:51.
> See attached file. We found a bug in PDFBox where it appears to randomly 
> delete 1 character of a twin pair of characters. 
> For example, we've noticed that what shows up as 1001 in a PDF file (in 
> Acrobat) may become 101 in the text output of TextStripper.exe. This appears 
> to happen to a large number of twin pairs, as evidenced by the attached file. 
> Note that the file was created using Microsoft Word 11.3.5 on Mac OS X using 
> the "print to PDF" feature of Mac OS X 10.4.10. 
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1761570&file_id=238686
> doublesTest.zip (application/zip), 40221 bytes
> This zip file contains a test PDF file and the text output from running 
> ExtractText.exe on it, in version 0.7.2 and 0.7.3

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PDFBOX-291) Text Extraction strips 1 char when extracting a twin pair

Reply via email to