Re: Why is string.front dchar?

Jakob Ovrum Wed, 15 Jan 2014 22:01:27 -0800

On Tuesday, 14 January 2014 at 11:42:34 UTC, Maxim Fomin wrote:

The root of the issue is that string literals containingcharacters which do not fit into signle byte are stillconverted to char[] array. This is strictly speaking not typesafe because it allows to reinterpret 2 or 4 byte code unit assequence of characters of 1 byte size. The string type is insome sense problematic in D. That's why the fact that .frontreturns dhcar is a way to correct the problem, it is not anattempt to introduce confusion.


This assertion makes all the wrong assumptions.

`char` is a UTF-8 code unit[1], and `string` is an array ofimmutable UTF-8 code units. The whole point of UTF-8 is theability to encode code points that need multiple bytes (UTF-8code units), so the string literal behaviour is perfectly regular.

Operations on code units are rare, which is why the standardlibrary instead treats strings as ranges of code points, forcorrectness by default. However, we must not prevent the userfrom being able to work on arrays of code units, as many stringalgorithms can be optimized by not doing full UTF decoding. Thestandard library does this on many occasions, and there are moreto come.

Note that the Unicode definition of an unqualified "character" isthe translation of a code *point*, which is very different from a*glyph*, which is what people generally associate the word"character" with. Thus, `string` is not an array of characters(i.e. an array where each element is a character), but `dstring`can be said to be.


[1] http://dlang.org/type

Re: Why is string.front dchar?

Reply via email to