On Tuesday, 17 July 2018 at 18:09:13 UTC, Jonathan M Davis wrote:
On Tuesday, July 17, 2018 17:28:19 Seb via Digitalmars-d wrote:
On Tuesday, 17 July 2018 at 16:58:37 UTC, Jonathan M Davis wrote:
> [...]

Well, there are few cases where the range type doesn't matter and one can simply compare bytes, e.g.

equal (e.g. "ä" == "ä" <=> [195, 164] == [195, 164])
commonPrefix
find
...

That effectively means treating rcstring as a range of char by default rather than not treating it as a range by default. And if we then do that only with functions that overload on rcstring rather than making rcstring actually a range of char, then why aren't we just treating it as a range of char in general?

IMHO, the fact that so many alogorithms currently special-case on arrays of characters is one reason that auto-decoding has been a disaster, and adding a bunch of overloads for rcstring is just compounding the problem. Algorithms should properly support arbitrary ranges of characters, and then rcstring can be passed to them by calling one of the functions on it to get a range of code units, code points, or graphemes to get an actual range - either that, or rcstring should default to being a range of char. going halfway and making it work with some functions via overloads really doesn't make sense.

Well, the problem of it being a range of char is that this might lead to very confusing behavior, e.g.

"ä".rcstring.split.join("|") == �|�

So we probably shouldn't go this route either.
The idea of adding overloads was to introduce a bit of user-convenience, s.t. they don't have to say

readText("foo".rcstring.by!char)

all the time.

You can still normalize with auto-decoding (the code units - and thus code points - are in a specific order even when encoded, and that order can be normalized), and really, anyone who wants fully correct string comparisons needs to be normalizing their strings. With that in mind, rcstring probably should support normalization of its internal representation.

It currently doesn't support this out of the box, but it's a very valid point and I added it to the list.

Reply via email to