On Monday, 20 January 2014 at 09:58:07 UTC, Jakob Ovrum wrote:
On Thursday, 16 January 2014 at 06:59:43 UTC, Maxim Fomin wrote:
This is wrong. String in D is de facto (by implementation, spec may say whatever is convenient for advertising D) array of single bytes which can keep UTF-8 code units. No way string type in D is always a string in a sense of code points/characters. Sometimes it happens that string type behaves like 'string', but if you put UTF-16 or UTF-32 text it would remind you what string type really is.

By implementation they are also UTF strings. String literals use UTF, `char.init` is 0xFF and `wchar.init` is 0xFFFF, foreach over narrow strings with `dchar` iterator variable type does UTF decoding etc.

I don't think you know what you're talking about; putting UTF-16 or UTF-32 in `string` is utter madness and not trivially possible. We have `wchar`/`wstring` and `dchar`/`dstring` for UTF-16 and UTF-32, respectively.


import std.stdio;

void main()
{
        string s = "о";
        writeln(s.length);
}

This compiles and prints 2. This means that string type is broken. It is broken in the way as I was attempting to explain.

This is attempt to explain problematic design as a wise action.

No, it's not. Please leave crappy, unsubstantiated arguments like this out of these forums.

Note, that I provided examples why design is problematic. The arguement isn't unsubstained.


[1] http://dlang.org/type

By the way, the link you provide says char is unsigned 8 bit type which can keep value of UTF-8 code unit.

Not *can*, but *does*. Otherwise it is an error in the program. The specification, compiler implementation (as shown above) and standard library all treat `char` as a UTF-8 code unit. Treat it otherwise at your own peril.


But such treating is nonsense. It is like treating integer or floating number as sequence of bytes. You are essentially saying that treating char as UTF-8 code unit is OK because language is treating char as UTF-8 code unit.

The only problem in the implementation here that you illustrate is that `['s', 'ä', 'д']` is of type `int[]`, which is a bug. It should be `dchar[]`. The length of `char[]` works as intended.

You are saying that length of char works as intended, which is true, but shows that design is broken.

Problems with string type can be illustrated as possible situation in domain of integers type. Assume that user wants 'number' type which accepts both integers, floats and doubles and treats them properly. This would require either library solution or a new special type in a language which is supported by both compiler and runtime library, which performs operation at runtime on objects of number type according to their effective type.

D designers want to support such feature (to make the language better), but as it happens in other situations, the support is only limited: compiler allows to do

alias immutable(int)[] number;
number my_number = [0, 3.14, 3.14l];

I don't understand this example. The compiler does *not* allow that code; try it for yourself.

It does not allow because it is nonsense. However it does allows equivalent nonsesnce in character types.

alias immutable(int)[] number;
number my_number = [0, 3.14, 3.14l]; // does not compile

alias immutable(char)[] string;
string s = "säд"; // compiles, however "säд" should default to wstring or dstring

Same reasons which prevent sane person from being OK with int[] number = [3.14l] should prevent him from being OK with string s = "säд"

Reply via email to