On Thursday, 6 September 2018 at 20:15:22 UTC, Jonathan M Davis
wrote:
On Thursday, September 6, 2018 1:04:45 PM MDT aliak via
Digitalmars-d wrote:
D makes the code-point case default and hence that becomes the
simplest to use. But unfortunately, the only thing I can think
of
that requires code point representations is when dealing
specifically with unicode algorithms (normalization, etc).
Here's
a good read on code points:
https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-un
icode-code-points/ -
tl;dr: application logic does not need or want to deal with
code points. For speed units work, and for correctness,
graphemes work.
I think that it's pretty clear that code points are objectively
the worst level to be the default. Unfortunately, changing it
to _anything_ else is not going to be an easy feat at this
point. But if we can first ensure that Phobos in general
doesn't rely on it (i.e. in general, it can deal with ranges of
char, wchar, dchar, or graphemes correctly rather than assuming
that all ranges of characters are ranges of dchar), then maybe
we can figure something out. Unfortunately, while some work has
been done towards that, what's mostly happened is that folks
have complained about auto-decoding without doing much to
improve the current situation. There's a lot more to this than
simply ripping out auto-decoding even if every D user on the
planet agreed that outright breaking almost every existing D
program to get rid of auto-decoding was worth it. But as with
too many things around here, there's a lot more talking than
working. And actually, as such, I should probably stop
discussing this and go do something useful.
- Jonathan M Davis
A tutorial page linked from the front page with some examples
would go a long way to making it easier for people. If I had
time and understood strings enough to explain to others I would
try to make a start, but unfortunately neither are true.
And if we are doing things right with RCString, then isn't it
easier to make the change with that first - which is new so can't
break code - and in some years when people are used to working
that way update Phobos (compiler switch in beginning and have big
transition a few years after that).
Isn't this one of the challenges created by the tension between D
being both a high-level and low-level language. The higher the
aim, the more problems you will encounter getting there. That's
okay.
And isn't the obstacle to breaking auto-decoding because it seems
to be a monolithic challenge of overwhelming magnitude, whereas
if we could figure out some steps to eat the elephant one
mouthful at a time (which might mean start with RCString) then it
will seem less intimidating. It will take years anyway perhaps -
but so what?