Yes, for sure, hex values are more accurate. So ftdump returns "\U+009E\U+004F" which is the correct name, so ftdump is doing something I do not know about.
When I call the get-name-function as shown, the returned value is 0x7e 0xd1 0x4f 0x53 So when it is a Mojibake-problem - is ftdump workarounding this? If yes: how? > Gesendet: Freitag, 03. September 2021 um 15:16 Uhr > Von: "Werner LEMBERG" <[email protected]> > An: [email protected] > Cc: [email protected] > Betreff: Re: Aw: Re: Re: Native TTF name sometimes contains crap > > > OK, so let's go through the font: when I decode it with ftdump, I > get the > > following entires for name and family: > > font family (ID 1) [Microsoft] > > (language=0x0804): > "\U+009E\U+004F" > full name (ID 4) [Microsoft] > > (language=0x0804): > "\U+009E\U+004F" > > When I read the related data via > > the freetype-functions, I get back > > string=žÑOSýýýýØ8hW Uh, oh, please > > tell us the byte values (in '0xXX' notation)! Everything else won't survice > > e-mail encoding/decoding without distortions. > string_len=4 > > ...means > > the žÑOS-part of the string is valid. But this in no case > decodes to 0x9E > > 0x00 0x4F 0x00! Welcome to Mojibake hell. The following possibilities come > > to my mind; there are certainly even more possibilities to screw up. (1) > > Wrong byte order. (2) Wrong encoding, for example interpreting GB2312 > > characters as UTF-8. (3) Ditto, but mixing up with UCS4 – or vice versa. It > > can also be combination of (1) to (3). My advice: Forget it. Either > > suppress invalid data, or simply follow the 'garbage in, garbage out' > > principle. Werner
