[
https://issues.apache.org/jira/browse/PDFBOX-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andreas Lehmkühler resolved PDFBOX-291.
---------------------------------------
Fix Version/s: 1.3.0
Resolution: Fixed
Works fine with the current trunk version (rev. 1003396). I attached the text
extraction result.
> Text Extraction strips 1 char when extracting a twin pair
> ---------------------------------------------------------
>
> Key: PDFBOX-291
> URL: https://issues.apache.org/jira/browse/PDFBOX-291
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Priority: Minor
> Fix For: 1.3.0
>
> Attachments: PDFBOX291-doublesTest.pdf, PDFBOX291-doublesTest.txt
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1761570
> Originally submitted by nalundgaard on 2007-07-26 13:51.
> See attached file. We found a bug in PDFBox where it appears to randomly
> delete 1 character of a twin pair of characters.
> For example, we've noticed that what shows up as 1001 in a PDF file (in
> Acrobat) may become 101 in the text output of TextStripper.exe. This appears
> to happen to a large number of twin pairs, as evidenced by the attached file.
> Note that the file was created using Microsoft Word 11.3.5 on Mac OS X using
> the "print to PDF" feature of Mac OS X 10.4.10.
> [attachment on SourceForge]
> http://sourceforge.net/tracker/download.php?group_id=78314&atid=552832&aid=1761570&file_id=238686
> doublesTest.zip (application/zip), 40221 bytes
> This zip file contains a test PDF file and the text output from running
> ExtractText.exe on it, in version 0.7.2 and 0.7.3
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.