[jira] [Created] (TIKA-3858) Ligatures convert on text extraction

tom hill (Jira) Thu, 15 Sep 2022 11:26:09 -0700

tom hill created TIKA-3858:
------------------------------

             Summary:  Ligatures convert on text extraction
                 Key: TIKA-3858
                 URL: https://issues.apache.org/jira/browse/TIKA-3858
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.5
         Environment: win 8, jre 1.5
            Reporter: tom hill
             Fix For: 1.7



According to tika sources review, it uses pdfbox to parse pdf files. 
I found that pdfbox itself uses icu4j to handle ligatures.
Unfortunately, when i added icu4j jar to my classpath nothing changed, 
ligatures are still not converted. Sample pdf file is attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (TIKA-3858) Ligatures convert on text extraction

Reply via email to