Hello Werner,
seems your crystal ball is not too bad ;-)
OK, the following is where I'm not sure if I forgot something:
I _only_ make use of data where the encode-ID is set to TT_MS_ID_UNICODE_CS.
From this I would assume, all data in related "string" member come with the
same encoding and therefore have to be used/decoded in the same way. Is this
correct?
But what I notice, is that this is true for all Russian fonts I have and for
about 50% of the Chinese fonts. But when decoding the "string" data of the
remaining 50% Chinese fonts (which also have the encode-ID
TT_MS_ID_UNICODE_CS), I get the mentioned crap. So this seems like there is any
other property one has to check when decoding the names?
The code I'm using is quite simple:
int32_t nameCnt=FT_Get_Sfnt_Name_Count(face);
if (nameCnt>0) for (i=0; i<nameCnt; i++)
{
FT_SfntName nameData;
if (FT_Get_Sfnt_Name(face,i,&nameData)==0)
{
if ((nameData.platform_id==TT_PLATFORM_MICROSOFT) &&
(nameData.encoding_id==TT_MS_ID_UNICODE_CS))
{
if ((nameData.language_id==TT_MS_LANGID_CHINESE_TAIWAN) ||
(nameData.language_id==TT_MS_LANGID_CHINESE_PRC) ||
(nameData.language_id==TT_MS_LANGID_CHINESE_HONG_KONG) ||
(nameData.language_id==TT_MS_LANGID_CHINESE_SINGAPORE) ||
(nameData.language_id==TT_MS_LANGID_CHINESE_MACAO))
{
if (nameData.name_id==TT_NAME_ID_FULL_NAME)
{
-> do some UTF16 BE decoding here with string and string_len
}
else if ((!fontEntry->chineseName) &&
((nameData.name_id==TT_NAME_ID_FONT_FAMILY) ||
(nameData.name_id==TT_NAME_ID_TYPOGRAPHIC_FAMILY)))
{
-> do some UTF16 BE decoding here with string and string_len
}
}
else if ((nameData.language_id==TT_MS_LANGID_RUSSIAN_RUSSIA) ||
(nameData.language_id==TT_MS_LANGID_RUSSIAN_MOLDAVIA))
{
if (nameData.name_id==TT_NAME_ID_FULL_NAME)
{
-> do some UTF16 BE decoding here with string and string_len
}
else if ((!fontEntry->chineseName) &&
((nameData.name_id==TT_NAME_ID_FONT_FAMILY) ||
(nameData.name_id==TT_NAME_ID_TYPOGRAPHIC_FAMILY)))
{
-> do some UTF16 BE decoding here with string and string_len
}
}
}
}
}
"face" belongs to a valid and opened TTF.
Mike
> Gesendet: Donnerstag, 02. September 2021 um 13:28 Uhr
> Von: "Werner LEMBERG" <[email protected]>
> An: [email protected]
> Cc: [email protected]
> Betreff: Re: Native TTF name sometimes contains crap
>
> > thanks to your help I'm retrieving the original names of TTFs via >
> > function FT_Get_Sfnt_Name() for the encode-ID TT_MS_ID_UNICODE_CS >
> > successfully. This seems to work well except for some language-ID's > of
> > type TT_MS_LANGID_CHINESE_*. There after decoding the > string-member I do
> > not get the correct name but a more or less long > array with 췍-character
> > (0xCDCD) only. So it seems there is either > something wrong or there is a
> > special encoding used? Any idea how to > correctly check and fix this? My
> > crystal ball says that you have to use the right (non-Unicode) character
> > encodings like Big5 or GB to read those strings. Otherwise you should show
> > an example. Werner