On 9/24/18 12:23 AM, Neia Neutuladh wrote:
On Monday, 24 September 2018 at 01:39:43 UTC, Walter Bright wrote:
On 9/23/2018 3:23 PM, Neia Neutuladh wrote:
Okay, that's why you previously selected C99 as the standard for what characters to allow. Do you want to update to match C11? It's been out for the better part of a decade, after all.

I wasn't aware it changed in C11.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 522 (PDF numbering) or 504 (internal numbering).

Outside the BMP, almost everything is allowed, including many things that are not currently mapped to any Unicode value. Within the BMP, a heck of a lot of stuff is allowed, including a lot that D doesn't currently allow.

GCC hasn't even updated to the C99 standard here, as far as I can tell, but clang-5.0 is up to date.

I searched around for the current state of symbol names in C, and found some really crappy rules, though maybe this site isn't up to date?:

https://en.cppreference.com/w/c/language/identifier

What I understand from that is:

1. Yes, you can use any unicode character you want in C/C++ (seemingly since C99) 2. There are no rules about what *encoding* is acceptable, it's implementation defined. So various compilers have different rules as to what will be accepted in the actual source code. In fact, I read somewhere that not even ASCII is guaranteed to be supported.

The result being, that you have to write the identifiers with an ASCII escape sequence in order for it to be actually portable. Which to me, completely defeats the purpose of using such identifiers in the first place.

For example, on that page, they have a line that works in clang, not in GCC (tagged as implementation defined):

char *🐱 = "cat";

The portable version looks like this:

char *\U0001f431 = "cat";

Seriously, who wants to use that?

Now, D can potentially do better (especially when all front-ends are the same) and support such things in the spec, but I think the argument "because C supports it" is kind of bunk.

Or am I reading it wrong?

In any case, I would expect that symbol name support should be focused only on languages which people use, not emojis. If there are words in Chinese or Japanese that can't be expressed using D, while other words can, it would seem inconsistent to a Chinese or Japanese speaking user, and I think we should work to fix that. I just have no idea what the state of that is.

I also tend to agree that most code is going to be written in English, even when the primary language of the user is not. Part of the reason, which I haven't read here yet, is that all the keywords are in English. Someone has to kind of understand those to get the meaning of some constructs, and it's going to read strangely with the non-english words.

One group which I believe hasn't spoken up yet is the group making the hunt framework, whom I believe are all Chinese? At least their web site is. It would be good to hear from a group like that which has large experience writing mature D code (it appears all to be in English) and how they feel about the support.

-Steve

Reply via email to