[jira] Commented: (PDFBOX-938) Wrong extracted text using PDFBox 1.4

JIRA Wed, 02 Feb 2011 11:45:54 -0800

    [ 
https://issues.apache.org/jira/browse/PDFBOX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989765#comment-12989765
 ]


Andreas Lehmkühler commented on PDFBOX-938:
-------------------------------------------

@Hesham 
I can confirm the issue with your sample. But I can't help you. As I already 
said, I'm not an AWT expert, but it seems that something is wrong with the 
encoding or the used font in your application.

As the current trunk works fine I'm going to solve this issue.

> Wrong extracted text using PDFBox 1.4
> -------------------------------------
>
>                 Key: PDFBOX-938
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-938
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 1.4.0
>            Reporter: Hesham
>             Fix For: 1.5.0
>
>         Attachments: Another book - Wrong extracted f char.pdf, 
> Another+book+-+Wrong+extracted+f+char.txt, Sample.zip, Wrong extracted f 
> char.pdf
>
>
> Hello ,
>  
> I am using PDFBox v1.4 to extract some text from a PDF, but some words are 
> not extracted right.
> For example words :
> "Nefteiugansk" is read: "Nežeiugansk"
> "fiancee" is read: "Äancée"
> "first" is read: "Ärst"
>  
> Please check the attached file to test this.
> Best regards

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PDFBOX-938) Wrong extracted text using PDFBox 1.4

Reply via email to