Peter Clifton wrote:
> On Fri, 2008-08-29 at 01:16 -0400, der Mouse wrote:
>   
>>> 2) Scan all the strings in the current document for non-latin-1 (e.g. 
>>> UTF-8) characters
>>>       
>> I must have misunderstood something here.  A string of octets may
>> simultaneously be valid Latin-1 text and valid UTF-8 text (for example,
>> 0xde 0xa3 is UTF-8 for Greek capital sigma, U+03A3, but is also Latin-1
>> for the two-character sequence capital-thorn pound-sign).
>>
>> Or does the "current document" being scanned store text in some way
>> which does not have this ambiguity?
>>     
>
> I presume that Mike means we scan for bytes with value >128 in the UTF-8
> string. The 7bit ASCII codes should be common to Latin-1 and Unicode.
>
>   
This is the code from f_print.c that does the magic:

 661           aux = o_current->text->string;
 662           while (aux && ((gunichar) (*aux) != 0)) {
 663             current_char = g_utf8_get_char_validated(aux, -1);
 664             if (current_char >= 127) {
 665               found = 0;  
 666               i = 0;
 667               while (i < count) {
 668                 if (table[i] == current_char)
 669                   found = 1;
 670                 i++;  
 671               }
 672               if (!found) {
 673                 if (count < 128)
 674                   table[count++] = current_char;
 675                 else 
 676                   s_log_message(_("Too many UTF-8 characters, cannot 
print\n"));
 677               }
 678             }  
 679             aux = g_utf8_find_next_char(aux, NULL);
 680           }

g_utf8_get_char_validated() does the magic.  It returns the 'raw' 
unicode character, and if the character value is more than 127, it looks 
for it in our table, adding it if it is missing.

In the same file you will find the hash table entry for the mu character:

{ GUINT_TO_POINTER (0x039C), "/Mu" },


Thus, if the Unicode for 'Mu', 0x039C appears in the input source, then 
some entry in the font map at an index >=128 will get remapped to 
'/Mu'.  If it doesn't render, it's because the postscript font is 
missing the glyph.




_______________________________________________
geda-dev mailing list
[email protected]
http://www.seul.org/cgi-bin/mailman/listinfo/geda-dev

Reply via email to