Le 17/05/2021 à 17:17, David Li a écrit :
A little clarification on my point: it's not that a single codepoint gets encoded with more than four bytes, it's that a grapheme cluster/human-delimited 'character' might be multiple codepoints, so reversing the individual codepoints may produce an unexpected result. For instance a flag emoji is actually two codepoints (two special 'letter' codepoints that represent the country code), so reversing a US flag naively will give you an odd '[SU]' instead.
This sounds like saying that reversing a valid French word does not produce a valid French word (well, in most cases). The kernel documentation can't contain an entire tutorial about Unicode characters and what to expect from them, IMHO.
Regards Antoine.