Thank you very much for your inputs, guys. So, based on the discussion, I
will make the following changes.

1. ASCII reverse would throw an error when a non-ASCII (valid/ invalid
utf8) byte is oThank you @antoinebserved (no change)
2. UTF8 kernel would return a garbage output when an invalid utf8 char is
observed but  (no change)
Thank you @antoine for the clarification.
3. Edit documentation to clarify that the kernel works on code-point level

On Mon, May 17, 2021 at 11:31 AM Antoine Pitrou <anto...@python.org> wrote:

>
> I'm fine with pointing out that the function operates on codepoints.
>
> Linking to the Unicode documentation for emojis sounds entirely like a
> distraction, though.
>
> Regards
>
> Antoine.
>
>
> Le 17/05/2021 à 17:28, Ian Cook a écrit :
> > +1 for clarifying this in the kernel documentation, referring to these
> > multi-emoji glyphs as "emoji ZWJ sequences," and linking to
> > https://unicode.org/emoji/charts/emoji-zwj-sequences.html
> >
> > Ian
> >
> >
> > On Mon, May 17, 2021 at 11:21 AM Antoine Pitrou <anto...@python.org>
> wrote:
> >>
> >>
> >> Le 17/05/2021 à 17:17, David Li a écrit :
> >>> A little clarification on my point: it's not that a single codepoint
> >>> gets encoded with more than four bytes, it's that a grapheme
> >>> cluster/human-delimited 'character' might be multiple codepoints, so
> >>> reversing the individual codepoints may produce an unexpected
> >>> result. For instance a flag emoji is actually two codepoints (two
> >>> special 'letter' codepoints that represent the country code), so
> >>> reversing a US flag naively will give you an odd '[SU]' instead.
> >>
> >> This sounds like saying that reversing a valid French word does not
> >> produce a valid French word (well, in most cases). The kernel
> >> documentation can't contain an entire tutorial about Unicode characters
> >> and what to expect from them, IMHO.
> >>
> >> Regards
> >>
> >> Antoine.
>


-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>

Reply via email to