Re: Why is string.front dchar?

Maxim Fomin Mon, 20 Jan 2014 04:00:51 -0800

On Monday, 20 January 2014 at 09:58:07 UTC, Jakob Ovrum wrote:

On Thursday, 16 January 2014 at 06:59:43 UTC, Maxim Fomin wrote:
This is wrong. String in D is de facto (by implementation,spec may say whatever is convenient for advertising D) arrayof single bytes which can keep UTF-8 code units. No way stringtype in D is always a string in a sense of codepoints/characters. Sometimes it happens that string typebehaves like 'string', but if you put UTF-16 or UTF-32 text itwould remind you what string type really is.
By implementation they are also UTF strings. String literalsuse UTF, `char.init` is 0xFF and `wchar.init` is 0xFFFF,foreach over narrow strings with `dchar` iterator variable typedoes UTF decoding etc.
I don't think you know what you're talking about; puttingUTF-16 or UTF-32 in `string` is utter madness and not triviallypossible. We have `wchar`/`wstring` and `dchar`/`dstring` forUTF-16 and UTF-32, respectively.


import std.stdio;

void main()
{
        string s = "о";
        writeln(s.length);
}

This compiles and prints 2. This means that string type isbroken. It is broken in the way as I was attempting to explain.

This is attempt to explain problematic design as a wise action.
No, it's not. Please leave crappy, unsubstantiated argumentslike this out of these forums.

Note, that I provided examples why design is problematic. Thearguement isn't unsubstained.

[1] http://dlang.org/type
By the way, the link you provide says char is unsigned 8 bittype which can keep value of UTF-8 code unit.
Not *can*, but *does*. Otherwise it is an error in the program.The specification, compiler implementation (as shown above) andstandard library all treat `char` as a UTF-8 code unit. Treatit otherwise at your own peril.

But such treating is nonsense. It is like treating integer orfloating number as sequence of bytes. You are essentially sayingthat treating char as UTF-8 code unit is OK because language istreating char as UTF-8 code unit.

The only problem in the implementation here that you illustrateis that `['s', 'ä', 'д']` is of type `int[]`, which is a bug.It should be `dchar[]`. The length of `char[]` works asintended.

You are saying that length of char works as intended, which istrue, but shows that design is broken.

Problems with string type can be illustrated as possiblesituation in domain of integers type. Assume that user wants'number' type which accepts both integers, floats and doublesand treats them properly. This would require either librarysolution or a new special type in a language which issupported by both compiler and runtime library, which performsoperation at runtime on objects of number type according totheir effective type.
D designers want to support such feature (to make the languagebetter), but as it happens in other situations, the support isonly limited: compiler allows to do
alias immutable(int)[] number;
number my_number = [0, 3.14, 3.14l];
I don't understand this example. The compiler does *not* allowthat code; try it for yourself.

It does not allow because it is nonsense. However it does allowsequivalent nonsesnce in character types.


alias immutable(int)[] number;
number my_number = [0, 3.14, 3.14l]; // does not compile

alias immutable(char)[] string;

string s = "säд"; // compiles, however "säд" should default towstring or dstring

Same reasons which prevent sane person from being OK with int[]number = [3.14l] should prevent him from being OK with string s ="säд"

Re: Why is string.front dchar?

Reply via email to