http://d.puremagic.com/issues/show_bug.cgi?id=3455
--- Comment #2 from Andrei Alexandrescu <and...@metalanguage.com> 2009-10-30 11:40:05 PDT --- (In reply to comment #1) > As http://www.digitalmars.com/d/1.0/lex.html#identifier very clearly states, > the allowed characters in identifiers are those defined in the C99 standard, > ISO/IEC 9899:1999(E) Annex D. Have a look at it: > http://www.open-std.org/JTC1/SC22/wg14/www/docs/n1124.pdf > > 9, code point 0xff19, is not in that list. The maximum one is 0xd7a3, in > fact. > This is not a bug, this is an enhancement. > > However, rather than an arbitrary and frozen list, I /would/ prefer basing it > simply on Unicode properties, such as Java's choice: identifiers may start > with > letters or numeric letters, and may contain, in addition to those, connecting > punctuation, decimal digits, and combining and non-spacing marks. In other > words: > > Identifiers may start with code points from the general categories Ll, Lm, Lo, > Lt, Lu, Nl. > > Identifiers may contain code points from the general categories Ll, Lm, Lo, > Lt, > Lu, Mc, Mn, Nd, Nl, No, Pc. > > Java also allows Cc and Cf, of whose usefulness I'm not so convinced. These > are > control characters and things like "soft hyphen", which isn't even supposed to > be displayed unless the word line-wraps. Too much potential for confusion > IMHO. Oh ok. Thanks Matti. I'm leaving this as an enhancement request. Currently the error message is: invalid UTF-8 sequence unsupported char 0x99 This is factually incorrect because the UTF-8 sequence is correct. I suggest instead: Unicode character 0xFF19 not allowed in a symbol Andrei -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------