[ 
https://issues.apache.org/jira/browse/PDFBOX-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Hewson closed PDFBOX-1017.
-------------------------------

    Resolution: Not a Problem

This PDF uses private use characters for the ligatures, so the information 
about their original meaning has been lost. Adobe Acrobat produces the same 
result, so there's nothing to fix here.

> Some Ligatures in a PDF file are not recognised.
> ------------------------------------------------
>
>                 Key: PDFBOX-1017
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1017
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Text extraction
>    Affects Versions: 1.6.0
>         Environment: Mac OS X 10.6.7, java version "1.6.0_24"
>            Reporter: Thomas Fischer
>              Labels: textExtraction
>         Attachments: Ligatures.pdf, Ligatures.txt
>
>
> In the attached file, some ligatures (Qu, Th, ch, ck, fft, ft, tt) are not 
> transformed but remain in the text with Unicode characters in the private 
> range UE0xx: "...im rabbinisen Sritum in untersiedlien Kontexten und 
> dort,..."



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to