On Wednesday, 1 June 2016 at 14:29:58 UTC, Andrei Alexandrescu wrote:
On 06/01/2016 06:25 AM, Marc Schütz wrote:
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescu wrote:
The point is to operate on representation-independent entities
(Unicode code points) instead of low-level representation-specific
artifacts (code units).

_Both_ are low-level representation-specific artifacts.

Maybe this is a misunderstanding. Representation = how things are laid out in memory. What does associating numbers with various Unicode symbols have to do with representation? --

Ok, if you define it that way, sure. I was thinking in terms of the actual text: Unicode is a way to represent that text using a variety of low-level representations: UTF8/NFC, UTF8/NFD, unnormalized UTF8, UTF16 big/little endian x normalization, UTF32 x normalization, some other more obscure ones. From that viewpoint, auto decoded char[] (= UTF8) is equivalent to dchar[] (= UTF32). Neither of them is the actual text.

Both writing and the memory representation consist of fundamental units. But there is no 1:1 relationship between the units of char[] (UTF8 code units) or auto decoded strings (Unicode code points) on the one hand, and the units of writing (graphemes) on the other.

Reply via email to