> OK, so let's go through the font: when I decode it with ftdump, I
> get the following entires for name and family:
> 
>    font family (ID 1) [Microsoft] (language=0x0804):
>       "\U+009E\U+004F"
>    full name (ID 4) [Microsoft] (language=0x0804):
>       "\U+009E\U+004F"
> 
> When I read the related data via the freetype-functions, I get back
> 
> string=žÑOSýýýýØ8hW

Uh, oh, please tell us the byte values (in '0xXX' notation)!
Everything else won't survice e-mail encoding/decoding without
distortions.

> string_len=4
> 
> ...means the žÑOS-part of the string is valid.  But this in no case
> decodes to 0x9E 0x00 0x4F 0x00!

Welcome to Mojibake hell.  The following possibilities come to my
mind; there are certainly even more possibilities to screw up.

  (1) Wrong byte order.
  (2) Wrong encoding, for example interpreting GB2312 characters as
      UTF-8.
  (3) Ditto, but mixing up with UCS4 – or vice versa.

It can also be combination of (1) to (3).

My advice: Forget it.  Either suppress invalid data, or simply follow
the 'garbage in, garbage out' principle.


    Werner

Reply via email to