On 9/24/18 12:23 AM, Neia Neutuladh wrote:
On Monday, 24 September 2018 at 01:39:43 UTC, Walter Bright wrote:
On 9/23/2018 3:23 PM, Neia Neutuladh wrote:
Okay, that's why you previously selected C99 as the standard for what
characters to allow. Do you want to update to match C11? It's been
out for the better part of a decade, after all.
I wasn't aware it changed in C11.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 522 (PDF
numbering) or 504 (internal numbering).
Outside the BMP, almost everything is allowed, including many things
that are not currently mapped to any Unicode value. Within the BMP, a
heck of a lot of stuff is allowed, including a lot that D doesn't
currently allow.
GCC hasn't even updated to the C99 standard here, as far as I can tell,
but clang-5.0 is up to date.
I searched around for the current state of symbol names in C, and found
some really crappy rules, though maybe this site isn't up to date?:
https://en.cppreference.com/w/c/language/identifier
What I understand from that is:
1. Yes, you can use any unicode character you want in C/C++ (seemingly
since C99)
2. There are no rules about what *encoding* is acceptable, it's
implementation defined. So various compilers have different rules as to
what will be accepted in the actual source code. In fact, I read
somewhere that not even ASCII is guaranteed to be supported.
The result being, that you have to write the identifiers with an ASCII
escape sequence in order for it to be actually portable. Which to me,
completely defeats the purpose of using such identifiers in the first place.
For example, on that page, they have a line that works in clang, not in
GCC (tagged as implementation defined):
char *🐱 = "cat";
The portable version looks like this:
char *\U0001f431 = "cat";
Seriously, who wants to use that?
Now, D can potentially do better (especially when all front-ends are the
same) and support such things in the spec, but I think the argument
"because C supports it" is kind of bunk.
Or am I reading it wrong?
In any case, I would expect that symbol name support should be focused
only on languages which people use, not emojis. If there are words in
Chinese or Japanese that can't be expressed using D, while other words
can, it would seem inconsistent to a Chinese or Japanese speaking user,
and I think we should work to fix that. I just have no idea what the
state of that is.
I also tend to agree that most code is going to be written in English,
even when the primary language of the user is not. Part of the reason,
which I haven't read here yet, is that all the keywords are in English.
Someone has to kind of understand those to get the meaning of some
constructs, and it's going to read strangely with the non-english words.
One group which I believe hasn't spoken up yet is the group making the
hunt framework, whom I believe are all Chinese? At least their web site
is. It would be good to hear from a group like that which has large
experience writing mature D code (it appears all to be in English) and
how they feel about the support.
-Steve