[
https://issues.apache.org/jira/browse/PDFBOX-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863504#comment-16863504
]
chunlinyao edited comment on PDFBOX-4570 at 6/13/19 10:17 PM:
--------------------------------------------------------------
Yes, MS PMincho changed to diagonal lines since windows vista. The old (CP932)
mapping U+2016 and U+2225 to same J+8161 code. So old MS PMincho contains only
one glyph for U+2225. and adobe UniJIS-UCS2-H mapping U+2016 and U+2225 to CID
666.
JIS_X0213 mapping U+2016 to J+8161 and U+2225 to J+81d2 and MS PMincho contains
two glyphs. adobe UniJIS-UTF16-H also changed to map U_2016 to 666 and U+2225
to 15489.
{code:bash}
$ echo "0000 2016 2225" | xxd -r |iconv -f utf-16be -t cp932 |xxd
00000000: 8161 8161 .a.a
$ echo "0000 2016 2225" | xxd -r |iconv -f utf-16be -t shift_jisx0213 |xxd
00000000: 8161 81d2 .a..
{code}
If users really require U+2225, we should suggest they change to UniJIS-UTF16-H
or embed the fonts.
It seems adobe reader bypassed the cmap, maybe they use the code from document
to lookup glyph directly when they known the source encoding is unicode and
font should lookup by unicode glyph name.
was (Author: chunlinyao):
Yes, MS PMincho changed to diagonal lines since windows vista. The old (CP932)
mapping U+2016 and U+2225 to same J+8161 code. So old MS PMincho contains only
one glyph for U+2225. and adobe UniJIS-UCS2-H mapping U+2016 and U+2225 to CID
666.
JIS_X0213 mapping U+2016 to J+8161 and U+2225 to J+81d2 and MS PMincho contains
two glyphs. adobe UniJIS-UTF16-H also changed to map U_2016 to 666 and U+2225
to 15489.
If users really require U+2225, we should suggest they change to UniJIS-UTF16-H
or embed the fonts.
It seems adobe reader bypassed the cmap, maybe they use the code from document
to lookup glyph directly when they known the source encoding is unicode and
font should lookup by unicode glyph name.
> U+2225 rendered as U+2016 glyph when use UniJIS-UCS2-H and non embedded font
> ----------------------------------------------------------------------------
>
> Key: PDFBOX-4570
> URL: https://issues.apache.org/jira/browse/PDFBOX-4570
> Project: PDFBox
> Issue Type: Improvement
> Components: FontBox
> Affects Versions: 2.0.15
> Environment: Windows 10 64bit, Adobe Reader 2019.012.20034
> Reporter: chunlinyao
> Priority: Minor
> Attachments: correct.png, incorrect.png, u2225.pdf, u2225.png
>
>
> Maybe this is not a bug of PDFBox, This pdf rendered difference than adobe
> reader. it use MS PMincho font, this font has glyph for U+2225, the glyph in
> Win10 different from WinXP (I confirmed that by using FontForge.)
> The Adobe Reader 2019.012.20034 ON Win10 rendered it correctly. Even Adobe
> Reader 2019.012.20034 ON macOS rendered incorrect. (with MSPMincho font
> installed)
> MuPDF 1.6 on Windows, Chrome, FireFox all rendered it like PDFBox.
> Although Adobe Reader on win10 rendered it correctly, When you copy the text
> from pdf, you will get U+2016 not U+2225.
> I doubt Adobe Reader doesn't use UniJIS-UCS2-H to convert unicode to cid then
> convert back to unicode when retrive glyphs.
> The UniJIS-UCS2-H is obsoleted. It mapping both U+2225 and U+2016 to CID+666,
> Change to UniJIS-UTF16-H can workaround this problem.
> Is there some posibility to improve PDFBox render like Adobe Reader?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]