On Thursday, 6 September 2018 at 20:15:22 UTC, Jonathan M Davis wrote:
On Thursday, September 6, 2018 1:04:45 PM MDT aliak via Digitalmars-d wrote:
D makes the code-point case default and hence that becomes the
simplest to use. But unfortunately, the only thing I can think of
that requires code point representations is when dealing
specifically with unicode algorithms (normalization, etc). Here's
a good read on code points:
https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-un
icode-code-points/ -

tl;dr: application logic does not need or want to deal with code points. For speed units work, and for correctness, graphemes work.

I think that it's pretty clear that code points are objectively the worst level to be the default. Unfortunately, changing it to _anything_ else is not going to be an easy feat at this point. But if we can first ensure that Phobos in general doesn't rely on it (i.e. in general, it can deal with ranges of char, wchar, dchar, or graphemes correctly rather than assuming that all ranges of characters are ranges of dchar), then maybe we can figure something out. Unfortunately, while some work has been done towards that, what's mostly happened is that folks have complained about auto-decoding without doing much to improve the current situation. There's a lot more to this than simply ripping out auto-decoding even if every D user on the planet agreed that outright breaking almost every existing D program to get rid of auto-decoding was worth it. But as with too many things around here, there's a lot more talking than working. And actually, as such, I should probably stop discussing this and go do something useful.

- Jonathan M Davis

Is there a unittest somewhere in phobos you know that one can be pointed to that shows the handling of these 4 variations you say should be dealt with first? Or maybe a PR that did some of this work that one could investigate?

I ask so I can see in code what it means to make something not rely on autodecoding and deal with ranges of char, wchar, dchar or graphemes.

Or a current "easy" bugzilla issue maybe that one could try a hand at?

Reply via email to