On 12/30/2011 11:09 PM, Andrei Alexandrescu wrote:
On 12/30/11 10:09 PM, Walter Bright wrote:
I'm not so sure about that. Timon Gehr's X macro tried to handle UTF-8
correctly, but it turned out that the naive version that used [i] and
.length worked correctly. This is typical, not exceptional.
The lower frequency of bugs makes them that much more difficult to spot. This is
essentially similar to the UTF16/UCS-2 morass: in a vast majority of the time
the programmer may consider UTF16 a coding with one code unit per code point
(which is what UCS-2 is). The existence of surrogates didn't make much of a
difference because, again, very often the wrong assumption just worked. Well
that all didn't go over all that well.
I'm not so sure it's quite the same. Java was designed before there were
surrogate pairs, they kinda got the rug pulled out from under them. So, they
simply have no decent way to deal with it. There isn't even a notion of a dchar
character type. Java was designed with codeunit==codepoint, it is embedded in
the design of the language, library, and culture.
This is not true of D. It's designed from the ground up to deal properly with
UTF. D has very simple language features to deal with it.
We need .raw and we must abolish .length and [] for narrow strings.
I don't believe that fixes anything and breaks every D project out there. We're
chasing phantoms here, and I worry a lot about over-engineering trivia.
And, we already have a type to deal with it: dstring