2025/01/18 10:15 ... Mouse:
There is no one-size-fits-all character set, nor encoding, nor even
serialization, no matter what the priests of the UTF-8 religion would
have you believe.
If you want to argue that UTF-8 is the best default, that at least is
worth discussing. But maintaining that there is any single "the right"
character set, encoding, or serialization is...nonsense. There is, at
most, "right" for a particular use case, or set of use cases.
ASCII is ugly, Latin-1 is ugly (at the defining meeting, the member from
France, no printer, no linguist, no typographer, against his country s
tradition repudiated the letter OE. Of course, another member jumped at
that and proposed multiplication and division. There is now a hole in
the added letters), Unicode is ugly. But UTF-8 is particularly ugly. It
has 5 message bits and 3 overhead bits, and writers in Devanagari, ...
Malayalam, ... Hangul, ... hiragana & katakana, ... above all in
Chinese, find that their text files are huge, bigger than if entered in
a 16-bit code (24-bit code, anyone?) with all its surrogates.
In bijective binary ("Bijective numeration" in Wikipedia) using seven
bits and one marker-bit, one can get up to only three bytes, and
inherent uniqueness. Anyone interested in losing UTF-8 s deliberate
redundancies?