[jira] Commented: (PDFBOX-938) Wrong extracted text using PDFBox 1.4

JIRA Sun, 23 Jan 2011 04:47:10 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985311#action_12985311
 ]


Andreas Lehmkühler commented on PDFBOX-938:
-------------------------------------------

Text extraction works fine using the currebnt trunk. AFAIU you might have a 
problem with the encoding used in YOUR application which hasn't anything to do 
with PDFBox. I'm not an AWT expert, but probably you somehow have to configure 
the used TextArea to render the text correctly.

> Wrong extracted text using PDFBox 1.4
> -------------------------------------
>
>                 Key: PDFBOX-938
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-938
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0
>            Reporter: Hesham
>         Attachments: Another book - Wrong extracted f char.pdf, 
> Another+book+-+Wrong+extracted+f+char.txt, Wrong extracted f char.pdf
>
>
> Hello ,
>  
> I am using PDFBox v1.4 to extract some text from a PDF, but some words are 
> not extracted right.
> For example words :
> "Nefteiugansk" is read: "Nežeiugansk"
> "fiancee" is read: "Äancée"
> "first" is read: "Ärst"
>  
> Please check the attached file to test this.
> Best regards

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PDFBOX-938) Wrong extracted text using PDFBox 1.4

Reply via email to