I'm fine with pointing out that the function operates on codepoints.

Linking to the Unicode documentation for emojis sounds entirely like a distraction, though.

Regards

Antoine.


Le 17/05/2021 à 17:28, Ian Cook a écrit :
+1 for clarifying this in the kernel documentation, referring to these
multi-emoji glyphs as "emoji ZWJ sequences," and linking to
https://unicode.org/emoji/charts/emoji-zwj-sequences.html

Ian


On Mon, May 17, 2021 at 11:21 AM Antoine Pitrou <anto...@python.org> wrote:


Le 17/05/2021 à 17:17, David Li a écrit :
A little clarification on my point: it's not that a single codepoint
gets encoded with more than four bytes, it's that a grapheme
cluster/human-delimited 'character' might be multiple codepoints, so
reversing the individual codepoints may produce an unexpected
result. For instance a flag emoji is actually two codepoints (two
special 'letter' codepoints that represent the country code), so
reversing a US flag naively will give you an odd '[SU]' instead.

This sounds like saying that reversing a valid French word does not
produce a valid French word (well, in most cases). The kernel
documentation can't contain an entire tutorial about Unicode characters
and what to expect from them, IMHO.

Regards

Antoine.

Reply via email to