On 2011-12-30 23:00:49 +0000, Andrei Alexandrescu <[email protected]> said:

Using .raw is /optimal/ because it states the assumption appropriately. The user knows '$' cannot be in the prefix of any other symbol, so she can state the byte alone is the character. If that were a non-ASCII character, the assumption wouldn't have worked.

So yeah, UTF-8 is great. But it is not miraculous. We need .raw.

After reading most of the thread, it seems to me like you're deconstructing strings as arrays one piece at a time, to the point where instead of arrays we'd basically get a string struct and do things on it. Maybe it's part of a grand scheme, more likely it's one realization after another leading to one change after another… let's see where all this will lead us:

0. in the beginning, strings were char[] arrays
1. arrays are generalized as ranges
2. phobos starts treating char arrays as bidirectional ranges of dchar (instead of random access ranges of char)
3. foreach on char[] should iterate over dchar by default
4. remove .length, random access, and slicing from char arrays
5. replace char[] with a struct { ubyte[] raw; }

Number 1 is great by itself, no debate there. Number 2 is debatable. Number 3 and 4 are somewhat required for consistency with number 2. Number 5 is just the logical conclusion of all these changes.

If we want a fundamental change to what strings are in D, perhaps we should start focusing on the broader issue instead of trying to pass piecemeal changes one after the other. For consistency's sake, I think we should either stop after 1 or go all the way to 5. Either we do it fully or we don't do it at all.

All those divergent interpretations of strings end up hurting the language. Walter and Andrei ought to find a way to agree with each other.

--
Michel Fortin
[email protected]
http://michelf.com/

Reply via email to