[Bug c/67224] UTF-8 support for identifier names in GCC

ejolson at unr dot edu Thu, 20 Aug 2015 15:16:03 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224


--- Comment #20 from Eric <ejolson at unr dot edu> ---
I've been looking at the code in lex_identifier as well as what goes on in
forms_identifier_p and so forth.  As some point each identifier needs to be
stored in the symbol table using ht_lookup_with_hash.  Proper functioning
requires that UTF-8 and UCN representations of the same unicode characters are
treated as the same symbol.  Thus, there needs to be some point at which the
identifiers are regularized to be either all UTF-8 or all UCN escaped ASCII. 
As gcc is working with UCNs right now, the obvious implementation allocates
temporary memory to hold the UCN escaped ASCII version of an UTF-8 identifier
and then frees it again after calling ht_lookup.  Any comments would be
appreciated.

[Bug c/67224] UTF-8 support for identifier names in GCC

Reply via email to