[jira] [Commented] (PDFBOX-2272) Can't extract vertical text correctly

John Hewson (JIRA) Tue, 30 Sep 2014 23:40:57 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14154453#comment-14154453
 ]


John Hewson commented on PDFBOX-2272:
-------------------------------------

1.8 can extract Unicode text in general but fails for this particular font. The 
2.0 trunk can successfully extract the text for this font. Neither version can 
handle the vertical layout correctly, so the text comes out in the wrong order. 

> Can't extract vertical text correctly
> -------------------------------------
>
>                 Key: PDFBOX-2272
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2272
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.6, 2.0.0
>            Reporter: Biligsaikhan Batjargal
>         Attachments: test.pdf, test.txt
>
>
> - 1.8.6 can't extract the Unicode due to failing to map the UCS2 CMap for 
> 90ms-RKSJ-V.
> - 2.0 extracts the text but can't handle the vertical layout
> Also see the file from PDFBOX-2294 which contains both horizontal and 
> vertical text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PDFBOX-2272) Can't extract vertical text correctly

Reply via email to