Re: Updating D beyond Unicode 2.0

Jacob Carlborg via Digitalmars-d Tue, 25 Sep 2018 12:26:56 -0700

On 2018-09-21 18:27, Neia Neutuladh wrote:

D's currently accepted identifier characters are based on Unicode 2.0:
* ASCII range values are handled specially.
* Letters and combining marks from Unicode 2.0 are accepted.
* Numbers outside the ASCII range are accepted.
* Eight random punctuation marks are accepted.

This follows the C99 standard.
Many languages use the Unicode standard explicitly: C#, Go, Java,Python, ECMAScript, just to name a few. A small number of languagesreject non-ASCII characters: Dart, Perl. Some languages are weirdlygenerous: Swift and C11 allow everything outside the Basic MultilingualPlane.
I'd like to update that so that D accepts something as a valididentifier character if it's a letter or combining mark or modifiersymbol that's present in Unicode 11, or a non-ASCII number. This allowsthe 146 most popular writing systems and a lot more characters fromthose writing systems. This *would* reject those eight randompunctuation marks, so I'll keep them in as legacy characters.
It would mean we don't have to reference the C99 standard whenenumerating the allowed characters; we just have to refer to the Unicodestandard, which we already need to talk about in the lexical part of thespec.
It might also make the lexer a tiny bit faster; it reduces the number ofvalid-ident-char segments to search from 245 to 134. On the other hand,it will change the ident char ranges from wchar to dchar, which meansthe table takes up marginally more memory.
And, of course, it lets you write programs entirely in Linear B, andthat's a marketing ploy not to be missed.
I've got this coded up and can submit a PR, but I thought I'd getfeedback here first.
Does anyone see any horrible potential problems here?

Or is there an interestingly better option?

Does this need a DIP?

I'm not a native English speaker but I write all my public and privatecode in English. Anyone I work with, I will expect them and make surethey're writing the code in English as well. English is not enougheither, it has to be American English.

Despite this I think that D should support as much of the Unicode aspossible (including using Unicode for identifiers). It should not be upto the programming language to decide which language the developershould write the code in.


--
/Jacob Carlborg

Re: Updating D beyond Unicode 2.0

Reply via email to