On Monday, 20 January 2014 at 09:58:07 UTC, Jakob Ovrum wrote:
On Thursday, 16 January 2014 at 06:59:43 UTC, Maxim Fomin wrote:
This is wrong. String in D is de facto (by implementation,
spec may say whatever is convenient for advertising D) array
of single bytes which can keep UTF-8 code units. No way string
type in D is always a string in a sense of code
points/characters. Sometimes it happens that string type
behaves like 'string', but if you put UTF-16 or UTF-32 text it
would remind you what string type really is.
By implementation they are also UTF strings. String literals
use UTF, `char.init` is 0xFF and `wchar.init` is 0xFFFF,
foreach over narrow strings with `dchar` iterator variable type
does UTF decoding etc.
I don't think you know what you're talking about; putting
UTF-16 or UTF-32 in `string` is utter madness and not trivially
possible. We have `wchar`/`wstring` and `dchar`/`dstring` for
UTF-16 and UTF-32, respectively.
import std.stdio;
void main()
{
string s = "о";
writeln(s.length);
}
This compiles and prints 2. This means that string type is
broken. It is broken in the way as I was attempting to explain.
This is attempt to explain problematic design as a wise action.
No, it's not. Please leave crappy, unsubstantiated arguments
like this out of these forums.
Note, that I provided examples why design is problematic. The
arguement isn't unsubstained.
[1] http://dlang.org/type
By the way, the link you provide says char is unsigned 8 bit
type which can keep value of UTF-8 code unit.
Not *can*, but *does*. Otherwise it is an error in the program.
The specification, compiler implementation (as shown above) and
standard library all treat `char` as a UTF-8 code unit. Treat
it otherwise at your own peril.
But such treating is nonsense. It is like treating integer or
floating number as sequence of bytes. You are essentially saying
that treating char as UTF-8 code unit is OK because language is
treating char as UTF-8 code unit.
The only problem in the implementation here that you illustrate
is that `['s', 'ä', 'д']` is of type `int[]`, which is a bug.
It should be `dchar[]`. The length of `char[]` works as
intended.
You are saying that length of char works as intended, which is
true, but shows that design is broken.
Problems with string type can be illustrated as possible
situation in domain of integers type. Assume that user wants
'number' type which accepts both integers, floats and doubles
and treats them properly. This would require either library
solution or a new special type in a language which is
supported by both compiler and runtime library, which performs
operation at runtime on objects of number type according to
their effective type.
D designers want to support such feature (to make the language
better), but as it happens in other situations, the support is
only limited: compiler allows to do
alias immutable(int)[] number;
number my_number = [0, 3.14, 3.14l];
I don't understand this example. The compiler does *not* allow
that code; try it for yourself.
It does not allow because it is nonsense. However it does allows
equivalent nonsesnce in character types.
alias immutable(int)[] number;
number my_number = [0, 3.14, 3.14l]; // does not compile
alias immutable(char)[] string;
string s = "säд"; // compiles, however "säд" should default to
wstring or dstring
Same reasons which prevent sane person from being OK with int[]
number = [3.14l] should prevent him from being OK with string s =
"säд"