[
https://issues.apache.org/jira/browse/PDFBOX-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230381#comment-14230381
]
John Hewson commented on PDFBOX-2532:
-------------------------------------
I'm not sure what you mean by "internal font mapping", if you mean the embedded
Encoding inside a Type1C font, it is certainly appropriate to use that when the
PDF does not provide any further information.
I'm using Acrobat Pro XI, build: 11.0.9.29, copy and pasting the first few
lines gives me:
{code}
7>PFLK>I 9>NH ;BNRF@B
=%;% .BM>NPJBKP LC PEB 3KPBNFLN
9>@FCF@ -L>OP ;@FBK@B >KA 5B>NKFKD -BKPBN
:BOB>N@E 9NLGB@P ;QJJ>NT .B@BJ?BN (&&*
"&++&,-+’$( #&+-&%+$-& !).&)-*+’&,
{code}
Save As... gives the same result.
> Text extraction fails due to the usage of the internal font mapping
> -------------------------------------------------------------------
>
> Key: PDFBOX-2532
> URL: https://issues.apache.org/jira/browse/PDFBOX-2532
> Project: PDFBox
> Issue Type: Bug
> Components: Text extraction
> Affects Versions: 2.0.0
> Reporter: Andreas Lehmkühler
> Fix For: 2.0.0
>
> Attachments: PDFBOX2247-701542.pdf
>
>
> If a pdf doesn't provide any mapping (neither an encoding nor a toUnicode
> mapping) we have to decide where to get a suitable mapping ourselves. We
> can't use the internal font mapping of the type1C font as it doesn't work in
> every case, see PDFBOX-2377
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)