Michael Reynolds created PDFBOX-4758:
----------------------------------------

             Summary: Text Extractor does not handle common typographic 
ligatures
                 Key: PDFBOX-4758
                 URL: https://issues.apache.org/jira/browse/PDFBOX-4758
             Project: PDFBox
          Issue Type: Bug
          Components: Text extraction
    Affects Versions: 2.0.18, 2.0.1
            Reporter: Michael Reynolds
         Attachments: TestExtractText.java, libreoffice-ligatures-test.pdf, 
msword-ligatures-test.pdf

TextExtractor mishandles typographic ligatures. I've attached test documents 
from both Microsoft Word and LibreOffice.

I've checked PDFBox's output against xPDF on CentOS, and the ligatures are 
properly handled with that utililty, so it appears that this is a PDFBox defect.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to