[ 
https://issues.apache.org/jira/browse/PDFBOX-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003189#comment-13003189
 ] 

Andreas Lehmkühler commented on PDFBOX-970:
-------------------------------------------

I solved the issue in revision 1078518. But I can only confirm that it works 
for ligatures as your example doesn't contain any german umlauts. Can you 
provide us with an other example or can you confirm that this solution also 
works for that kind of pdfs?

> TeX-created ligatures and umlauts are not recognised
> ----------------------------------------------------
>
>                 Key: PDFBOX-970
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-970
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.5.0
>         Environment: Mac OS X 10.6.6, Java(TM) SE Runtime Environment (build 
> 1.6.0_22-b04-307-10M3261)
>            Reporter: Thomas Fischer
>              Labels: textExtraction
>         Attachments: A Python Library for Provenance Recording and 
> Querying.txt, A Python Library for Provenance Recording and Querying.txt
>
>
> Ligatures in a TeX-created document are lost, which are regognised by v. 1.4, 
> e.g.
>   1.4          1.5
> official      ocial
> effort        e ort
> fields        elds
> first          rst
> In addition, German umlauts (ä, ö, ü) are represented as ( a,  o,  u), 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to