[ 
https://issues.apache.org/jira/browse/PDFBOX-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863515#comment-16863515
 ] 

chunlinyao commented on PDFBOX-4572:
------------------------------------

The magic number is MS明朝 encoded in CP932
{code}
echo "0000 826c 8272 96BE 92A9" |xxd -r |iconv -f cp932
MS明朝{code}
PDF 1.6 Reference APPENDIX H Compatibility and Implementation Notes
 Sections 3.2.4
{quote}5. In Acrobat 4.0 and earlier versions, a name object being treated as 
text is
 typically interpreted in a host platform encoding, which depends on the
 operating system and the local language. For Asian languages, this
 encoding may be something like Shift-JIS or Big Five. Consequently, it is
 necessary to distinguish between names encoded this way and ones
 encoded as UTF-8. Fortunately, UTF-8 encoding is very stylized and its
 use can usually be recognized. A name that does not conform to UTF-8
 encoding rules can instead be interpreted according to host platform encoding.
{quote}
Are there any method to detect the magic host platform encoding?

> Font name not decoded correctly.
> --------------------------------
>
>                 Key: PDFBOX-4572
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4572
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>    Affects Versions: 2.0.15
>            Reporter: chunlinyao
>            Priority: Minor
>         Attachments: sample_ja.pdf
>
>
> The attached file encode font name in MS932, PDFBox decode it incorrectly. 
> Maybe this file is malformed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to