On Wed, 21 Sep 2011 20:20:55 +0200, Christophe Travert
<[email protected]> wrote:
Yeah, well, as long as char is a unicode code unit, that's the way that
it
goes.
They are not unicode units.
void main() {
char a = 'ä';
writeln(a); // outputs: \344
writeln('ä'); // outputs: ä
}
Obviouly, a code unit don't fit in a char.
Thus 'char[]' is not what the name claims it is.
Oh, it absolutely is. According to the Unicode Consortium, A code unit is
"The minimal bit combination that can represent a unit of encoded text
for processing or interchange. The Unicode Standard uses 8-bit code units
in the UTF-8 encoding form [...]".
What you are thinking about is a code point.
Unicode operations should be supported by a different class that
is really a lazy range of dchar implemented as an undelying char[], with
no length, index, or stride operator, and appropriate optimizations.
I can agree with this, but the benefits over what we already have are nigh
zilch.
--
Simen