On Sunday, 26 December 2021 at 21:22:42 UTC, Adam Ruppe wrote:
On Sunday, 26 December 2021 at 20:50:39 UTC, rempas wrote:
[...]
write just transfers a sequence of bytes. It doesn't know nor
care what they represent - that's for the receiving end to
figure out.
[...]
You are mistaken. There's several exceptions, utf-16 can come
in pairs, and even utf-32 has multiple "characters" that
combine onto one thing on screen.
I prefer to think of a string as a little virtual machine that
can be run to produce output rather than actually being
"characters". Even with plain ascii, consider the backspace
"character" - it is more an instruction to go back than it is a
thing that is displayed on its own.
[...]
This is because the *receiving program* treats them as utf-8
and runs it accordingly. Not all terminals will necessarily do
this, and programs you pipe to can do it very differently.
[...]
The [w|d|]string.length function returns the number of elements
in there, which is bytes for string, 16 bit elements for
wstring (so bytes / 2), or 32 bit elements for dstring (so
bytes / 4).
This is not necessarily related to the number of characters
displayed.
[...]
yes, it just passes bytes through. It doesn't know they are
supposed to be characters...
I think that mental model is pretty good actually. Maybe a more
specific idea exists, but this virtual machine concept does
actually explain to the new programmer to expect dragons - or at
least that the days of plain ASCII are long gone (and never
happened, e.g. backspace as you say)