All this talk about these higher-plane characters - you know, plane
1 and above; let's call them MathText characters for short - has got
me wondering.

Why is there no UTF-24?

See, these MathText characters take up a lot of space. No matter how
you encode them; UTF-8, UTF-16 or UTF-32; they always are 4 bytes
long. Now if we had UTF-24, they would only take up 3 bytes.
And since the Unicode character range is formally defined to run no
higher than U+10FFFD, which fits in 3 bytes, I see no reason why
no-one has ever gone to the trouble of defining a 3-byte storage
method.
Implementation would be easy; there would be only two variants,
UTF-24LE and UTF-24BE, and that's it. No juggling with bits like in
UTF-8 and UTF-16 or anything complicated like that. Just the plain
character values, just like in UTF-32, only with 75% of the storage
needed.

Comments anyone?

Pim Blokland


Reply via email to