On 06/19/2013 05:34 AM, monarch_dodra wrote:

> I know a "binary" char can hold the values 0 to 0xFF. However, I'm
> wondering about the cases where a codepoint can fit inside a char. For
> example, 'ç' is represented by 0xe7, which technically fits inside a char.

'ç' is represented by 0xe7 in an encoding that is not UTF-8. :)

That would be a special agreement between the producer and the consumer of that string. Otherwise, 0xe7 is not 'ç'. I recommend ubyte[] for those cases.

In UTF-8, 0xe7 is the first byte of a 3-byte code point:

import std.stdio;

void main()
{
    char[] a = [ 'a', 'b', 'c', 0xe7, 0x80, 0x80 ];
    writeln(a);
}

Prints a Chinese character:

abc瀀

Ali

Reply via email to