Re: The Case Against Autodecode

Marc Schütz via Digitalmars-d Thu, 02 Jun 2016 04:26:58 -0700

On Wednesday, 1 June 2016 at 14:29:58 UTC, Andrei Alexandrescuwrote:

On 06/01/2016 06:25 AM, Marc Schütz wrote:
On Tuesday, 31 May 2016 at 21:01:17 UTC, Andrei Alexandrescuwrote:
The point is to operate on representation-independent entities
(Unicode code points) instead of low-levelrepresentation-specific
artifacts (code units).
_Both_ are low-level representation-specific artifacts.
Maybe this is a misunderstanding. Representation = how thingsare laid out in memory. What does associating numbers withvarious Unicode symbols have to do with representation? --

Ok, if you define it that way, sure. I was thinking in terms ofthe actual text: Unicode is a way to represent that text using avariety of low-level representations: UTF8/NFC, UTF8/NFD,unnormalized UTF8, UTF16 big/little endian x normalization, UTF32x normalization, some other more obscure ones. From thatviewpoint, auto decoded char[] (= UTF8) is equivalent to dchar[](= UTF32). Neither of them is the actual text.

Both writing and the memory representation consist of fundamentalunits. But there is no 1:1 relationship between the units ofchar[] (UTF8 code units) or auto decoded strings (Unicode codepoints) on the one hand, and the units of writing (graphemes) onthe other.

Re: The Case Against Autodecode

Reply via email to