[ https://issues.apache.org/jira/browse/PDFBOX-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Lehmkühler resolved PDFBOX-751. --------------------------------------- Resolution: Fixed Fix Version/s: 1.4.0 Assignee: Andreas Lehmkühler You are using a quite old version. At least you should try version 1.3.1 or better the upcoming new release. I attached the resulting text extracted with the current trunk version. > Text Extraction truncates last character when image page has sideways text > -------------------------------------------------------------------------- > > Key: PDFBOX-751 > URL: https://issues.apache.org/jira/browse/PDFBOX-751 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.1.0 > Environment: HP UX 11iV1 > Reporter: Chris Chadwick > Assignee: Andreas Lehmkühler > Fix For: 1.4.0 > > Attachments: getimage1.pdf, PDFBOX751-getimage1.txt > > > When using unsorted text extraction on a PDF that contains a horizontal page > (normal orienation text) followed by a page where all the text is rotated 90 > degrees (landscape) , the last character of each word is forced onto a new > line. For example > Thi > s > erro > r > wa > s > logge > d > toda > y > It is only the last letter of each phrase that is affected, and it is only > affected on the rotated page. > Selecting the text from the image directly - in adobe do 'Select All', cut - > produces the correct results, as do other tools, so the text layer appears > correct in the PDF file. > Also please could you publish when V1.2 be ready as this may resolve this > issue. Is it available as beta? > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.