On 12/2/22 13:18, thebluepandabear wrote:

> But I don't really understand this? What does it mean that it 'must be
> represented by at least 2 bytes'?

The integral value of Ğ in unicode is 286.

  https://unicodeplus.com/U+011E

Since 'char' is 8 bits, it cannot store 286.

At first, that sounds like a hopeless situation, making one think that Ğ cannot be represented in a string. The concept of encoding to the rescue: Ğ can be encoded by 2 chars:

import std.stdio;

void main() {
    foreach (c; "Ğ") {
        writefln!"%b"(c);
    }
}

That program prints

11000100
10011110

Articles like the following explain well how that second byte is a continuation byte:

  https://en.wikipedia.org/wiki/UTF-8#Encoding

(It's a continuation byte because it starts with the bits 10).

> I don't think it was explained well in
> the book.

Coincidentally, according to another recent feedback I received, unicode and UTF are introduced way too early for such a book. I agree. I hadn't understood a single thing when the first time smart people were trying to explain unicode and UTF encodings to the company where I worked at. I had years of programming experience back then. (Although, I now think the instructors were not really good; and the company was pretty bad as well. :) )

> Any help would be appreciated.

I recommend the Wikipedia page I linked above. It is enlightening to understand how about 150K unicode characters can be encoded with units of 8 bits.

You can safely ignore wchar, dchar, wstring, and dstring for daily coding. Only special programs may need to deal with those types. 'char' and string are what we need and do use predominantly in D.

Ali

Reply via email to