Aw: Re: Native TTF name sometimes contains crap

virtual_worlds Thu, 02 Sep 2021 06:09:49 -0700

Hello Werner,

seems your crystal ball is not too bad ;-)


OK, the following is where I'm not sure if I forgot something:

I _only_ make use of data where the encode-ID is set to TT_MS_ID_UNICODE_CS. 
From this I would assume, all data in related "string" member come with the 
same encoding and therefore have to be used/decoded in the same way. Is this 
correct?

But what I notice, is that this is true for all Russian fonts I have and for 
about 50% of the Chinese fonts. But when decoding the "string" data of the 
remaining 50% Chinese fonts (which also have the encode-ID 
TT_MS_ID_UNICODE_CS), I get the mentioned crap. So this seems like there is any 
other property one has to check when decoding the names?

The code I'm using is quite simple:

int32_t nameCnt=FT_Get_Sfnt_Name_Count(face);
if (nameCnt>0) for (i=0; i<nameCnt; i++)
{
   FT_SfntName nameData;

   if (FT_Get_Sfnt_Name(face,i,&nameData)==0)
   {
      if ((nameData.platform_id==TT_PLATFORM_MICROSOFT) &&
          (nameData.encoding_id==TT_MS_ID_UNICODE_CS))
          {
            if ((nameData.language_id==TT_MS_LANGID_CHINESE_TAIWAN) ||
                (nameData.language_id==TT_MS_LANGID_CHINESE_PRC) ||
                (nameData.language_id==TT_MS_LANGID_CHINESE_HONG_KONG) ||
                (nameData.language_id==TT_MS_LANGID_CHINESE_SINGAPORE) ||
                (nameData.language_id==TT_MS_LANGID_CHINESE_MACAO))
            {
               if (nameData.name_id==TT_NAME_ID_FULL_NAME)
               {
-> do some UTF16 BE decoding here with string and string_len
               }
               else if ((!fontEntry->chineseName) &&
                        ((nameData.name_id==TT_NAME_ID_FONT_FAMILY) || 
(nameData.name_id==TT_NAME_ID_TYPOGRAPHIC_FAMILY)))
               {
-> do some UTF16 BE decoding here with string and string_len
               }
            }
            else if ((nameData.language_id==TT_MS_LANGID_RUSSIAN_RUSSIA) ||
                     (nameData.language_id==TT_MS_LANGID_RUSSIAN_MOLDAVIA))
            {
               if (nameData.name_id==TT_NAME_ID_FULL_NAME)
               {
-> do some UTF16 BE decoding here with string and string_len
               }
               else if ((!fontEntry->chineseName) &&
                        ((nameData.name_id==TT_NAME_ID_FONT_FAMILY) || 
(nameData.name_id==TT_NAME_ID_TYPOGRAPHIC_FAMILY)))
               {
-> do some UTF16 BE decoding here with string and string_len
               }
            }
         }
      }
   }

"face" belongs to a valid and opened TTF.

Mike



> Gesendet: Donnerstag, 02. September 2021 um 13:28 Uhr
> Von: "Werner LEMBERG" <[email protected]>
> An: [email protected]
> Cc: [email protected]
> Betreff: Re: Native TTF name sometimes contains crap
>
> > thanks to your help I'm retrieving the original names of TTFs via > 
> > function FT_Get_Sfnt_Name() for the encode-ID TT_MS_ID_UNICODE_CS > 
> > successfully. This seems to work well except for some language-ID's > of 
> > type TT_MS_LANGID_CHINESE_*. There after decoding the > string-member I do 
> > not get the correct name but a more or less long > array with 췍-character 
> > (0xCDCD) only. So it seems there is either > something wrong or there is a 
> > special encoding used? Any idea how to > correctly check and fix this? My 
> > crystal ball says that you have to use the right (non-Unicode) character 
> > encodings like Big5 or GB to read those strings. Otherwise you should show 
> > an example. Werner

Aw: Re: Native TTF name sometimes contains crap

Reply via email to