On Tuesday, 17 July 2018 at 18:09:13 UTC, Jonathan M Davis wrote:
On Tuesday, July 17, 2018 17:28:19 Seb via Digitalmars-d wrote:
On Tuesday, 17 July 2018 at 16:58:37 UTC, Jonathan M Davis
wrote:
> [...]
Well, there are few cases where the range type doesn't matter
and one can simply compare bytes, e.g.
equal (e.g. "ä" == "ä" <=> [195, 164] == [195, 164])
commonPrefix
find
...
That effectively means treating rcstring as a range of char by
default rather than not treating it as a range by default. And
if we then do that only with functions that overload on
rcstring rather than making rcstring actually a range of char,
then why aren't we just treating it as a range of char in
general?
IMHO, the fact that so many alogorithms currently special-case
on arrays of characters is one reason that auto-decoding has
been a disaster, and adding a bunch of overloads for rcstring
is just compounding the problem. Algorithms should properly
support arbitrary ranges of characters, and then rcstring can
be passed to them by calling one of the functions on it to get
a range of code units, code points, or graphemes to get an
actual range - either that, or rcstring should default to being
a range of char. going halfway and making it work with some
functions via overloads really doesn't make sense.
Well, the problem of it being a range of char is that this might
lead to very confusing behavior, e.g.
"ä".rcstring.split.join("|") == �|�
So we probably shouldn't go this route either.
The idea of adding overloads was to introduce a bit of
user-convenience, s.t. they don't have to say
readText("foo".rcstring.by!char)
all the time.
You can still normalize with auto-decoding (the code units -
and thus code points - are in a specific order even when
encoded, and that order can be normalized), and really, anyone
who wants fully correct string comparisons needs to be
normalizing their strings. With that in mind, rcstring probably
should support normalization of its internal representation.
It currently doesn't support this out of the box, but it's a very
valid point and I added it to the list.