Re: How to print unicode characters (no library)?

ag0aep6g via Digitalmars-d-learn Mon, 27 Dec 2021 22:51:35 -0800

On 27.12.21 15:23, Adam D Ruppe wrote:

Let's look at:


"Hello 😂\n";

[...]

Finally, there's "string", which is utf-8, meaning each element is 8bits, but again, there is a buffer you need to build up to get the codepoints you feed into that VM.

[...]

H, e, l, l, o, <space>, <next point is combined by these bits PLUS THREEMORE elements>, <this is a work-in-progress element and needs two more>,<this is a work-in-progress element and needs one more>, <this is thefinal work-in-progress element>, <new line>

[...]

Notice how each element here told you how many elements are left. Thisis encoded into the bit pattern and is part of why it took 4 elementsinstead of just three; there's some error-checking redundancy in there.This is a nice part of the design allowing you to validate a utf-8stream more reliably and even recover if you jumped somewhere in themiddle of a multi-byte sequence.

It's actually just the first byte that tells you how many are in thesequence. The continuation bytes don't have redundancies for that.

To recover from the middle of a sequence, you just skip the orphanedcontinuation bytes one at a time.

Re: How to print unicode characters (no library)?

Reply via email to