Tim Allison created PDFBOX-2377:
-----------------------------------
Summary: Apparent regression in character mapping in a few files
from govdocs1
Key: PDFBOX-2377
URL: https://issues.apache.org/jira/browse/PDFBOX-2377
Project: PDFBox
Issue Type: Bug
Reporter: Tim Allison
On a small number of test files in a 50k sample of pdfs from govdocs1, it
appears that some characters are no longer being extracted correctly. I ran
pdfbox's app.jar with ExtractText
{noformat}
764949.pdf
1.8.6: Lang, Astrophysical Data: Planets and Stars
1.8.7: Lang, AefdaphyeiUSl DSfS: PlSnefe Snd EfSde,
{noformat}
and
{noformat}
312888.pdf
1.8.6: Self-Assessment \u0026 Capability Description
1.8.7: Seff-Ammemmmehn \u0026 Cajabcfcns Demclcjncih
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)