Peter Clifton wrote:
> On Fri, 2008-08-29 at 01:16 -0400, der Mouse wrote:
>
>>> 2) Scan all the strings in the current document for non-latin-1 (e.g.
>>> UTF-8) characters
>>>
>> I must have misunderstood something here. A string of octets may
>> simultaneously be valid Latin-1 text and valid UTF-8 text (for example,
>> 0xde 0xa3 is UTF-8 for Greek capital sigma, U+03A3, but is also Latin-1
>> for the two-character sequence capital-thorn pound-sign).
>>
>> Or does the "current document" being scanned store text in some way
>> which does not have this ambiguity?
>>
>
> I presume that Mike means we scan for bytes with value >128 in the UTF-8
> string. The 7bit ASCII codes should be common to Latin-1 and Unicode.
>
>
This is the code from f_print.c that does the magic:
661 aux = o_current->text->string;
662 while (aux && ((gunichar) (*aux) != 0)) {
663 current_char = g_utf8_get_char_validated(aux, -1);
664 if (current_char >= 127) {
665 found = 0;
666 i = 0;
667 while (i < count) {
668 if (table[i] == current_char)
669 found = 1;
670 i++;
671 }
672 if (!found) {
673 if (count < 128)
674 table[count++] = current_char;
675 else
676 s_log_message(_("Too many UTF-8 characters, cannot
print\n"));
677 }
678 }
679 aux = g_utf8_find_next_char(aux, NULL);
680 }
g_utf8_get_char_validated() does the magic. It returns the 'raw'
unicode character, and if the character value is more than 127, it looks
for it in our table, adding it if it is missing.
In the same file you will find the hash table entry for the mu character:
{ GUINT_TO_POINTER (0x039C), "/Mu" },
Thus, if the Unicode for 'Mu', 0x039C appears in the input source, then
some entry in the font map at an index >=128 will get remapped to
'/Mu'. If it doesn't render, it's because the postscript font is
missing the glyph.
_______________________________________________
geda-dev mailing list
[email protected]
http://www.seul.org/cgi-bin/mailman/listinfo/geda-dev