On Wed, 21 Sep 2011 20:20:55 +0200, Christophe Travert <[email protected]> wrote:

Yeah, well, as long as char is a unicode code unit, that's the way that it
goes.

They are not unicode units.

void main() {
  char a = 'ä';
  writeln(a); // outputs: \344
  writeln('ä'); // outputs: ä
}

Obviouly, a code unit don't fit in a char.
Thus 'char[]' is not what the name claims it is.

Oh, it absolutely is. According to the Unicode Consortium, A code unit is
"The minimal bit combination that can represent a unit of encoded text
for processing or interchange. The Unicode Standard uses 8-bit code units
in the UTF-8 encoding form [...]".

What you are thinking about is a code point.


Unicode operations should be supported by a different class that
is really a lazy range of dchar implemented as an undelying char[], with
no length, index, or stride operator, and appropriate optimizations.

I can agree with this, but the benefits over what we already have are nigh
zilch.


--
  Simen

Reply via email to