Re: Why is string.front dchar?

Maxim Fomin Wed, 15 Jan 2014 23:02:34 -0800

On Thursday, 16 January 2014 at 05:56:48 UTC, Jakob Ovrum wrote:

On Tuesday, 14 January 2014 at 11:42:34 UTC, Maxim Fomin wrote:
The root of the issue is that string literals containingcharacters which do not fit into signle byte are stillconverted to char[] array. This is strictly speaking not typesafe because it allows to reinterpret 2 or 4 byte code unit assequence of characters of 1 byte size. The string type is insome sense problematic in D. That's why the fact that .frontreturns dhcar is a way to correct the problem, it is not anattempt to introduce confusion.
This assertion makes all the wrong assumptions.
`char` is a UTF-8 code unit[1], and `string` is an array ofimmutable UTF-8 code units. The whole point of UTF-8 is theability to encode code points that need multiple bytes (UTF-8code units), so the string literal behaviour is perfectlyregular.

This is wrong. String in D is de facto (by implementation, specmay say whatever is convenient for advertising D) array of singlebytes which can keep UTF-8 code units. No way string type in D isalways a string in a sense of code points/characters. Sometimesit happens that string type behaves like 'string', but if you putUTF-16 or UTF-32 text it would remind you what string type reallyis.

Operations on code units are rare, which is why the standardlibrary instead treats strings as ranges of code points, forcorrectness by default. However, we must not prevent the userfrom being able to work on arrays of code units, as many stringalgorithms can be optimized by not doing full UTF decoding. Thestandard library does this on many occasions, and there aremore to come.


This is attempt to explain problematic design as a wise action.

Note that the Unicode definition of an unqualified "character"is the translation of a code *point*, which is very differentfrom a *glyph*, which is what people generally associate theword "character" with. Thus, `string` is not an array ofcharacters (i.e. an array where each element is a character),but `dstring` can be said to be.
[1] http://dlang.org/type

By the way, the link you provide says char is unsigned 8 bit typewhich can keep value of UTF-8 code unit.

UTF is irrelevant because the problem is in D implementation. Seehttp://forum.dlang.org/thread/hoopiiobddbapybbw...@forum.dlang.org(in particular 2nd page).

The root of the issue is that D does not provide 'utf' type whichwould handle correctly strings and characters irrespective of theformat. But instead the language pretends that it supports suchtype by allowing to convert to single byte char array bothliterals "sad" and "säд". And ['s', 'ä', 'д'] is by the wayneither char[], no wchar[], even not dchar[] but sequence ofintegers, which compounds oddities in character types.

Problems with string type can be illustrated as possiblesituation in domain of integers type. Assume that user wants'number' type which accepts both integers, floats and doubles andtreats them properly. This would require either library solutionor a new special type in a language which is supported by bothcompiler and runtime library, which performs operation at runtimeon objects of number type according to their effective type.

D designers want to support such feature (to make the languagebetter), but as it happens in other situations, the support isonly limited: compiler allows to do


alias immutable(int)[] number;
number my_number = [0, 3.14, 3.14l];

but there is no support in runtime library. On surface, thislooks like language have type which supports wanted feature, butif you use it, you will face the problems, as my_number[2] wouldgive strange integer instead of float 3.14 and length of thisarray is 4 instead of 3. In addition this is not a type safeapproach because there is option to reinterpret double as twointegers.

Now in order to fix this, there is number of functions in librarywhich treat number type properly. Such practice (limited andbroken language support, unsafe and illogical type, functions tocorrect design mistakes) is embedded into practice so deeply,that anyone who point out on this problem in newsgroup is treatedas a fool and is sent to study IEE 754 standard.

Re: Why is string.front dchar?

Reply via email to