On Wednesday, December 06, 2017 09:34:48 Ola Fosheim Grøstad via Digitalmars-d-learn wrote: > On Wednesday, 6 December 2017 at 09:24:33 UTC, Jonathan M Davis > > wrote: > > UTF-32 on the other hand is guaranteed to have a code unit be a > > full code point. > > I don't think the standard says that? Isn't this only because the > current set is small enough to fit? So this may change as Unicode > grows?
It's most definitely the case right now, and given how Unicode decoding works, I don't see how it could ever be the case that a UTF-32 code unit would not be a code point - not without breaking all of the Unicode handling in existence. And per wikipedia's short article on code points ---------------- The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 216) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112. ---------------- And uint.max is 4,294,967,295, leaving about 3855x space to grow into even if they kept adding more code point values by adding more planes or however that works. I'd have to go digging through the actual standard to know for sure what it actually guarantees though. - Jonathan M Davis