[jira] [Commented] (PDFBOX-2272) Can't extract vertical text correctly

John Hewson (JIRA) Wed, 15 Jul 2015 12:42:04 -0700

    [ 
https://issues.apache.org/jira/browse/PDFBOX-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628599#comment-14628599
 ]


John Hewson commented on PDFBOX-2272:
-------------------------------------

The idea behind your patch looks good to me - it's a nice simple approach to 
solving a relatively uncommon and rather tricky problem.  If you can try to 
make the smallest possible number of changes to the original code (i.e. 
handleTextPosition() looks to me like refactoring which isn't strictly needed), 
then it'll make it much easier for us to review, apply, and maintain your 
patch. Thanks for the effort!

> Can't extract vertical text correctly
> -------------------------------------
>
>                 Key: PDFBOX-2272
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2272
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.8.6, 2.0.0
>            Reporter: Biligsaikhan Batjargal
>         Attachments: PDFTextStripper.java, test.pdf, test.txt, vertical.patch
>
>
> - -1.8.6 can't extract the Unicode due to failing to map the UCS2 CMap for 
> 90ms-RKSJ-V.-
> - 2.0 extracts the text but can't handle the vertical layout
> Also see the file from PDFBOX-2294 which contains both horizontal and 
> vertical text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (PDFBOX-2272) Can't extract vertical text correctly

Reply via email to