[
https://issues.apache.org/jira/browse/PDFBOX-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Hewson closed PDFBOX-1017.
-------------------------------
Resolution: Not a Problem
This PDF uses private use characters for the ligatures, so the information
about their original meaning has been lost. Adobe Acrobat produces the same
result, so there's nothing to fix here.
> Some Ligatures in a PDF file are not recognised.
> ------------------------------------------------
>
> Key: PDFBOX-1017
> URL: https://issues.apache.org/jira/browse/PDFBOX-1017
> Project: PDFBox
> Issue Type: Improvement
> Components: Text extraction
> Affects Versions: 1.6.0
> Environment: Mac OS X 10.6.7, java version "1.6.0_24"
> Reporter: Thomas Fischer
> Labels: textExtraction
> Attachments: Ligatures.pdf, Ligatures.txt
>
>
> In the attached file, some ligatures (Qu, Th, ch, ck, fft, ft, tt) are not
> transformed but remain in the text with Unicode characters in the private
> range UE0xx: "...im rabbinisen Sritum in untersiedlien Kontexten und
> dort,..."
--
This message was sent by Atlassian JIRA
(v6.2#6252)