On Tue, 11 Mar 2014 13:18:46 -0400, Chris Williams <[email protected]> wrote:

On Tuesday, 11 March 2014 at 14:16:31 UTC, Steven Schveighoffer wrote:
But I would never expect any kind of indexing or slicing to use "number of code points", which clearly requires O(n) decoding to determine it's position. That would be disastrous.

If the indexes put into the slice aren't by code-point, but people need to use proper helper functions to convert a code-point into an index, then we're basically back to where we are today.

No, where we are today is that in some cases, the language treats a char[] as an array of char, in other cases, it treats a char[] as a bi-directional dchar range.

What I'm proposing is we have a type that defines "This is what a string looks like," and it is consistent across all uses of the string, instead of the schizophrenic view we have now. I would also point out that quite a bit of deception and nonsense is needed to maintain that view, including things like assert(!hasLength!(char[]) && __traits(compiles, { char[] x; int y = x.length;})). The documentation for hasLength says "Tests if a given range has the length attribute," which is clearly a lie.

However, I want to define right here, that index is not a number of code points. One does not frequently get code point counts, one gets indexes. It has always been that way, and I'm not planning to change that. That you can't use the index to determine the number of code points that came before it, is not a frequent issue that arises.

e.g., I want to find the first instance of "xyz" in a string, do I care how many code points it has to go through, or what point I have to slice the string to get that?

A previous poster brings up this incorrect code:

auto index = countUntil(str, "xyz");
auto newstr = str[index..$];

But it can easily be done this way also:

auto index = indexOf(str, "xyz");
auto codepts = walkLength(str[0..index]);
auto newstr = str[index..$];

Given how D works, I think it would be very costly and near impossible to somehow make the incorrect slice operation statically rejected. One simply has to be trained what a code point is, and what a code unit is. HOWEVER, for the most part, nobody needs to care. Strings work fine without having to randomly access specific code points or slice based on them. Using indexes works just fine.

-Steve

Reply via email to