On 21/09/11 5:39 PM, Andrei Alexandrescu wrote:
On 9/21/11 10:16 AM, Christophe wrote:
Timon Gehr , dans le message (digitalmars.D:144889), a écrit :
unicode natively. Yet the 'D strings are strange and confusing' argument
comes up quite often on the web.
Well, I think they are. The ptr+length stuff is amasing, but the
behavior of strings in phobos is weird.
mini-quiz: what should std.range.drop(some_string, 1) do ?
hint: what it actually does is not what the documentation of phobos
suggests*...
Strings are array of char, but they appear like a lazy range of dchar to
phobos. I could cope with the fact that this is a little unexpected for
beginners. But well, that creates a lot of exceptions in phobos, like
the fact that you can't even copy a char[] to a char[] with
std.algorithm.copy. And I don't mention all the optimization that are
not/cannot be performed for those strings. I'll just remember to use
ubyte[] wherever I can...
String handling in D is good modulo the oddities you noticed. What would
make it perfect would be:
* Add property .rep that returns byte[], ushort[], or uint[] for char[],
wchar[], dchar[] respectively (with the appropriate qualifier).
* Replace .length with .codeUnits.
* Disallow [n] and [m .. n]
This would upgrade D's strings from good to awesome. Really it would be
a dream come true. Unfortunately it would also break most D code there
is out there. I don't see how we can improve the current situation while
staying backward compatible.
Andrei
From what I can see, the problem with D string is that they are a
'magic' special case for arrays.
char[] should be an array of char, just like int[] is an array of int.
If you have a T[] arr, then typeof(arr.front) should be T. This is what
everyone would expect. char[] should essentially be the same as byte[],
although char[] would be more natural for ASCII strings.
string should be something different, a separate type. As you say,
disallow [n] and [m..n] would be good as they make no sense with VLE.
You could have .length and .codeUnits, but length would have to be O(n).
That's not ideal, but since string wouldn't be an array, it doesn't need
to have the same complexity guarantees.
Same for wchar[], dchar[], wstring and dstring.
Of course, making that change would break existing code. Maybe D3? :-)