Comment #2 from Andrei Alexandrescu <> 2009-10-30 
11:40:05 PDT ---
(In reply to comment #1)
> As very clearly states,
> the allowed characters in identifiers are those defined in the C99 standard,
> ISO/IEC 9899:1999(E) Annex D. Have a look at it:
> 9, code point 0xff19, is not in that list. The maximum one is 0xd7a3, in 
> fact. 
> This is not a bug, this is an enhancement.
> However, rather than an arbitrary and frozen list, I /would/ prefer basing it
> simply on Unicode properties, such as Java's choice: identifiers may start 
> with
> letters or numeric letters, and may contain, in addition to those, connecting
> punctuation, decimal digits, and combining and non-spacing marks. In other
> words:
> Identifiers may start with code points from the general categories Ll, Lm, Lo,
> Lt, Lu, Nl.
> Identifiers may contain code points from the general categories Ll, Lm, Lo, 
> Lt,
> Lu, Mc, Mn, Nd, Nl, No, Pc.
> Java also allows Cc and Cf, of whose usefulness I'm not so convinced. These 
> are
> control characters and things like "soft hyphen", which isn't even supposed to
> be displayed unless the word line-wraps. Too much potential for confusion 

Oh ok. Thanks Matti. I'm leaving this as an enhancement request. Currently the
error message is:

invalid UTF-8 sequence
unsupported char 0x99

This is factually incorrect because the UTF-8 sequence is correct. I suggest

Unicode character 0xFF19 not allowed in a symbol


