https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67224
--- Comment #20 from Eric <ejolson at unr dot edu> --- I've been looking at the code in lex_identifier as well as what goes on in forms_identifier_p and so forth. As some point each identifier needs to be stored in the symbol table using ht_lookup_with_hash. Proper functioning requires that UTF-8 and UCN representations of the same unicode characters are treated as the same symbol. Thus, there needs to be some point at which the identifiers are regularized to be either all UTF-8 or all UCN escaped ASCII. As gcc is working with UCNs right now, the obvious implementation allocates temporary memory to hold the UCN escaped ASCII version of an UTF-8 identifier and then frees it again after calling ht_lookup. Any comments would be appreciated.