On Friday, December 30, 2011 20:55:42 Timon Gehr wrote: > 1. They don't notice. Then it is not a problem, because they are > obviously only using ASCII characters and it is perfectly reasonable to > assume that code units and characters are the same thing.
The problem is that what's more likely to happen in a lot of cases is that they use it wrong and don't notice, because they're only using ASCII in testing, _but_ they have bugs all over the place, because their code is actually used with unicode in the field. Yes, diligent programmers will generally find such problems, but with the current scheme, it's _so_ easy to use length when you shouldn't, that it's pretty much a guarantee that it's going to happen. I'm not sure that Andrei's suggestion is the best one at this point, but I sure wouldn't be against it being introduced. It wouldn't entirely fix the problem by any means, but programmers would then have to work harder at screwing it up and so there would be fewer mistakes. Arguably, the first issue with D strings is that we have char. In most languages, char is supposed to be a character, so many programmers will code with that expectation. If we had something like utf8unit, utf16unit, and utf32unit (arguably very bad, albeit descriptive, names) and no char, then it would force programmers to become semi-educated about the issues. There's no way that that's changing at this point though. - Jonathan M Davis
