[
https://issues.apache.org/jira/browse/PDFBOX-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045708#comment-17045708
]
Tilman Hausherr commented on PDFBOX-4785:
-----------------------------------------
Yes it is indeed related, see the /ToUnicode stream of the first font:
{code:java}
<1b70> <1b71> <66FF> {code}
See the commend by [~mkl] in PDFBOX-4661. This is an incorrect PDF, the third
token has FF but the range has two elements.
> No Unicode mapping with MS-Mincho
> ---------------------------------
>
> Key: PDFBOX-4785
> URL: https://issues.apache.org/jira/browse/PDFBOX-4785
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 2.0.18, 2.0.19
> Reporter: Ryosuke Fujita
> Priority: Major
> Attachments: E02779_convocation_notice_p14.pdf
>
>
> ExtractText from attached pdf fails after v2.0.18 while v2.0.17 succeed.
> Error message is as follows, and can't extract character "最"(CID+7025).
> FEB 26, 2020 10:32:29 AM org.apache.pdfbox.pdmodel.font.PDType0Font toUnicode
> WARNING: No Unicode mapping for CID+7025 (7025) in font NAEGKL+MS-Mincho
> This maybe related to PDFBOX-4661?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]