[
https://issues.apache.org/jira/browse/PDFBOX-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003459#comment-13003459
]
Andreas Lehmkühler commented on PDFBOX-970:
-------------------------------------------
I can't confirm the umlaut issue. The latest snapshot works fine for me. Do you
have the icu-jar on your classpath?
The position of the german quote seems to be misinterpreted. Because of being
placed very low on the line the algo presumes is has to be on the next line. It
was already an issue with 1.4.0
I guess the JIRA error occured because of some maintenance ( the infra guys
just upgraded JIRA to 4.2.4).
> TeX-created ligatures and umlauts are not recognised
> ----------------------------------------------------
>
> Key: PDFBOX-970
> URL: https://issues.apache.org/jira/browse/PDFBOX-970
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 1.5.0
> Environment: Mac OS X 10.6.6, Java(TM) SE Runtime Environment (build
> 1.6.0_22-b04-307-10M3261)
> Reporter: Thomas Fischer
> Labels: textExtraction
> Attachments: A Python Library for Provenance Recording and
> Querying.txt, A Python Library for Provenance Recording and Querying.txt,
> Test.pdf, Test.pdf
>
>
> Ligatures in a TeX-created document are lost, which are regognised by v. 1.4,
> e.g.
> 1.4 1.5
> official ocial
> effort e ort
> fields elds
> first rst
> In addition, German umlauts (ä, ö, ü) are represented as ( a, o, u),
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira