On Wednesday, September 21, 2011 19:56:47 Christophe wrote: > "Jonathan M Davis" , dans le message (digitalmars.D:144922), a écrit : > > 1. drop says nothing about slicing. > > 2. popFrontN (which drop calls) says that it slices for ranges that > > support slicing. Strings do not unless they're arrays of dchar. > > > > Yes, hasSlicing should probably be clearer about narrow strings, but > > that has nothing to do with drop. > > I never said there was a problem with drop.
Yes you did. You said: "mini-quiz: what should std.range.drop(some_string, 1) do ? hint: what it actually does is not what the documentation of phobos suggests*..." > After having read all of you, I have no problems with string being a > lazy range of dchar. But I have a problem with immutable(char)[] being > lazy range of dchar (ie not being a array), and I have a problem with > string being immutable(char)[] (ie providing length opIndex and > opSlice). For efficiency, you need to be able to treat strings as arrays of code units for some algorithms. For correctness, you need to be able to treat them as ranges of code points (dchar) in the general case. You need both. The question is how to provide that. strings as arrays came first (D1), whereas ranges came later. We _need_ to treat strings as ranges of dchar or they're essentially unusable in the general case. Operating on code units is almost always _wrong_. So, when we added the range functions, we special-cased them for strings so that strings are treated as ranges of dchar as they need to be. And in cases where you actually need to treat a string as an array of code units for efficiency, you special case the function for them, and you still get that. What other way would you do it? There _are_ some edges here - such as foreach defalting to char for string when dchar is really what you shoud be iterating with - and there are times when you want to use a string with a range-based function and can't, because it needs a random-access range or one which is sliceable to do what it does, which can be annoying. But what else can you do there? You can't treat the string as a range of code units in that case. The result would be completely wrong. Imagine if sort worked on a char[]. You'd get an array of sorted code units, which would _not_ be code points, and which would be completely useless. So, treating a string as a range of code units makes no sense. We could switch to having a struct of some kind which was a string, make it a range of dchar, and have it contain an array of char, wchar, or dchar internally. It would have to restrict its operations in exactly the same manner that the range functions for strings currently do, so the exact same algorithms would or wouldn't work with it. And then you'd need to provide access to the underlying array of code units so that algorithms special casing strings could operate on the array instead. Ultimately, it's pretty much the same thing, except now you have a wrapper struct. How does that buy you anything? The _only_ thing that it would buy you AFAIK is that foreach would then default to dchar instead of the code unit type. The basic problem still exists. You still need to special case strings for efficiency, and you still need to treat them as a range of dchar in the general case. It's an inherent issue with variable length encodings. You can't just magically make it go away. If you have a better solution, please share it, but the fact that we want both efficiency and correctness binds us pretty thoroughly here. - Jonathan M Davis
