Michael Reynolds created PDFBOX-4758:
----------------------------------------
Summary: Text Extractor does not handle common typographic
ligatures
Key: PDFBOX-4758
URL: https://issues.apache.org/jira/browse/PDFBOX-4758
Project: PDFBox
Issue Type: Bug
Components: Text extraction
Affects Versions: 2.0.18, 2.0.1
Reporter: Michael Reynolds
Attachments: TestExtractText.java, libreoffice-ligatures-test.pdf,
msword-ligatures-test.pdf
TextExtractor mishandles typographic ligatures. I've attached test documents
from both Microsoft Word and LibreOffice.
I've checked PDFBox's output against xPDF on CentOS, and the ligatures are
properly handled with that utililty, so it appears that this is a PDFBox defect.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]