Re: D's confusing strings (was Re: D on hackernews)

Simen Kjaeraas Wed, 21 Sep 2011 12:05:44 -0700

On Wed, 21 Sep 2011 20:20:55 +0200, Christophe Travert<[email protected]> wrote:

Yeah, well, as long as char is a unicode code unit, that's the way thatit
goes.


They are not unicode units.

void main() {
  char a = 'ä';
  writeln(a); // outputs: \344
  writeln('ä'); // outputs: ä
}

Obviouly, a code unit don't fit in a char.
Thus 'char[]' is not what the name claims it is.


Oh, it absolutely is. According to the Unicode Consortium, A code unit is
"The minimal bit combination that can represent a unit of encoded text
for processing or interchange. The Unicode Standard uses 8-bit code units
in the UTF-8 encoding form [...]".

What you are thinking about is a code point.

Unicode operations should be supported by a different class that
is really a lazy range of dchar implemented as an undelying char[], with
no length, index, or stride operator, and appropriate optimizations.


I can agree with this, but the benefits over what we already have are nigh
zilch.


--
  Simen

Re: D's confusing strings (was Re: D on hackernews)

Reply via email to