Re: Updating D beyond Unicode 2.0

Steven Schveighoffer via Digitalmars-d Mon, 24 Sep 2018 06:30:33 -0700

On 9/24/18 12:23 AM, Neia Neutuladh wrote:

On Monday, 24 September 2018 at 01:39:43 UTC, Walter Bright wrote:
On 9/23/2018 3:23 PM, Neia Neutuladh wrote:
Okay, that's why you previously selected C99 as the standard for whatcharacters to allow. Do you want to update to match C11? It's beenout for the better part of a decade, after all.
I wasn't aware it changed in C11.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 522 (PDFnumbering) or 504 (internal numbering).
Outside the BMP, almost everything is allowed, including many thingsthat are not currently mapped to any Unicode value. Within the BMP, aheck of a lot of stuff is allowed, including a lot that D doesn'tcurrently allow.
GCC hasn't even updated to the C99 standard here, as far as I can tell,but clang-5.0 is up to date.

I searched around for the current state of symbol names in C, and foundsome really crappy rules, though maybe this site isn't up to date?:


https://en.cppreference.com/w/c/language/identifier

What I understand from that is:

1. Yes, you can use any unicode character you want in C/C++ (seeminglysince C99)2. There are no rules about what *encoding* is acceptable, it'simplementation defined. So various compilers have different rules as towhat will be accepted in the actual source code. In fact, I readsomewhere that not even ASCII is guaranteed to be supported.

The result being, that you have to write the identifiers with an ASCIIescape sequence in order for it to be actually portable. Which to me,completely defeats the purpose of using such identifiers in the first place.

For example, on that page, they have a line that works in clang, not inGCC (tagged as implementation defined):


char *🐱 = "cat";

The portable version looks like this:

char *\U0001f431 = "cat";

Seriously, who wants to use that?

Now, D can potentially do better (especially when all front-ends are thesame) and support such things in the spec, but I think the argument"because C supports it" is kind of bunk.


Or am I reading it wrong?

In any case, I would expect that symbol name support should be focusedonly on languages which people use, not emojis. If there are words inChinese or Japanese that can't be expressed using D, while other wordscan, it would seem inconsistent to a Chinese or Japanese speaking user,and I think we should work to fix that. I just have no idea what thestate of that is.

I also tend to agree that most code is going to be written in English,even when the primary language of the user is not. Part of the reason,which I haven't read here yet, is that all the keywords are in English.Someone has to kind of understand those to get the meaning of someconstructs, and it's going to read strangely with the non-english words.

One group which I believe hasn't spoken up yet is the group making thehunt framework, whom I believe are all Chinese? At least their web siteis. It would be good to hear from a group like that which has largeexperience writing mature D code (it appears all to be in English) andhow they feel about the support.


-Steve

Re: Updating D beyond Unicode 2.0

Reply via email to