[ 
https://issues.apache.org/jira/browse/PDFBOX-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Lehmkühler resolved PDFBOX-751.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4.0
         Assignee: Andreas Lehmkühler

You are using a quite old version. At least you should try version 1.3.1 or 
better the upcoming new release.

I attached the resulting text extracted with the current trunk version.

> Text Extraction truncates last character when image page has sideways text
> --------------------------------------------------------------------------
>
>                 Key: PDFBOX-751
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-751
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.1.0
>         Environment: HP UX 11iV1
>            Reporter: Chris Chadwick
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.4.0
>
>         Attachments: getimage1.pdf, PDFBOX751-getimage1.txt
>
>
> When using unsorted text extraction on a PDF that contains a horizontal page 
> (normal orienation text) followed by a page where all the text is rotated 90 
> degrees (landscape) , the last character of each word is forced onto a new 
> line. For example
> Thi
> s
> erro
> r
> wa
> s
> logge
> d
> toda
> y
> It is only the last letter of each phrase that is affected, and it is only 
> affected on the rotated page.
> Selecting the text from the image directly - in adobe do 'Select All', cut  - 
> produces the correct results, as do other tools, so the text layer appears 
> correct in the PDF file.
> Also please could you publish when V1.2 be ready as this may resolve this 
> issue. Is it available as beta?
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to