Text Extraction truncates last character when image page has sideways text
--------------------------------------------------------------------------
Key: PDFBOX-751
URL: https://issues.apache.org/jira/browse/PDFBOX-751
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 1.1.0
Environment: HP UX 11iV1
Reporter: Chris Chadwick
When using unsorted text extraction on a PDF that contains a horizontal page
(normal orienation text) followed by a page where all the text is rotated 90
degrees (landscape) , the last character of each word is forced onto a new
line. For example
Thi
s
erro
r
wa
s
logge
d
toda
y
It is only the last letter of each phrase that is affected, and it is only
affected on the rotated page.
Selecting the text from the image directly - in adobe do 'Select All', cut -
produces the correct results, as do other tools, so the text layer appears
correct in the PDF file.
Also please could you publish when V1.2 be ready as this may resolve this
issue. Is it available as beta?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.