[
https://issues.apache.org/jira/browse/PDFBOX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989765#comment-12989765
]
Andreas Lehmkühler commented on PDFBOX-938:
-------------------------------------------
@Hesham
I can confirm the issue with your sample. But I can't help you. As I already
said, I'm not an AWT expert, but it seems that something is wrong with the
encoding or the used font in your application.
As the current trunk works fine I'm going to solve this issue.
> Wrong extracted text using PDFBox 1.4
> -------------------------------------
>
> Key: PDFBOX-938
> URL: https://issues.apache.org/jira/browse/PDFBOX-938
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 1.4.0
> Reporter: Hesham
> Fix For: 1.5.0
>
> Attachments: Another book - Wrong extracted f char.pdf,
> Another+book+-+Wrong+extracted+f+char.txt, Sample.zip, Wrong extracted f
> char.pdf
>
>
> Hello ,
>
> I am using PDFBox v1.4 to extract some text from a PDF, but some words are
> not extracted right.
> For example words :
> "Nefteiugansk" is read: "Nežeiugansk"
> "fiancee" is read: "Äancée"
> "first" is read: "Ärst"
>
> Please check the attached file to test this.
> Best regards
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira