Re: Proposal for fixing dchar ranges

Steven Schveighoffer Tue, 11 Mar 2014 11:07:18 -0700

On Tue, 11 Mar 2014 13:18:46 -0400, Chris Williams<[email protected]> wrote:

On Tuesday, 11 March 2014 at 14:16:31 UTC, Steven Schveighoffer wrote:
But I would never expect any kind of indexing or slicing to use "numberof code points", which clearly requires O(n) decoding to determine it'sposition. That would be disastrous.
If the indexes put into the slice aren't by code-point, but people needto use proper helper functions to convert a code-point into an index,then we're basically back to where we are today.

No, where we are today is that in some cases, the language treats a char[]as an array of char, in other cases, it treats a char[] as abi-directional dchar range.

What I'm proposing is we have a type that defines "This is what a stringlooks like," and it is consistent across all uses of the string, insteadof the schizophrenic view we have now. I would also point out that quite abit of deception and nonsense is needed to maintain that view, includingthings like assert(!hasLength!(char[]) && __traits(compiles, { char[] x;int y = x.length;})). The documentation for hasLength says "Tests if agiven range has the length attribute," which is clearly a lie.

However, I want to define right here, that index is not a number of codepoints. One does not frequently get code point counts, one gets indexes.It has always been that way, and I'm not planning to change that. That youcan't use the index to determine the number of code points that camebefore it, is not a frequent issue that arises.

e.g., I want to find the first instance of "xyz" in a string, do I carehow many code points it has to go through, or what point I have to slicethe string to get that?


A previous poster brings up this incorrect code:

auto index = countUntil(str, "xyz");
auto newstr = str[index..$];

But it can easily be done this way also:

auto index = indexOf(str, "xyz");
auto codepts = walkLength(str[0..index]);
auto newstr = str[index..$];

Given how D works, I think it would be very costly and near impossible tosomehow make the incorrect slice operation statically rejected. One simplyhas to be trained what a code point is, and what a code unit is. HOWEVER,for the most part, nobody needs to care. Strings work fine without havingto randomly access specific code points or slice based on them. Usingindexes works just fine.


-Steve

Re: Proposal for fixing dchar ranges

Reply via email to