Re: [9fans] Why does utfutf() exist?

quiekaizam via 9fans Thu, 18 Dec 2025 08:18:15 -0800

>  I would assume converting to a rune would turn out the same either way:


This sounds wrong to me. IIUC Runes are just Unicode code points. Glyphs may 
have multiple representations in Unicode, of which your ü is a good example. 
Mapping these representations together is a question of Unicode normalization, 
however, and involves lots of fiddly questions whose answers are specific to 
the particular use case. As such, conversation to Runes cannot reasonably 
perform normalization AFAIU.

2025年12月18日 18:53:35 JST、Shawn Rutledge <[email protected]> より:
>> On Dec 17, 2025, at 22:17, Jacob Moody <[email protected]> wrote:
>> 
>> I've been poking at some of the utf* functions lately and utfutf is a bit 
>> puzzling.
>> At face value, strstr() should be sufficient for handling utf8 encoded 
>> strings just as strcmp() is.
> 
> Maybe normalization could be the reason: there can be multiple 
> representations, for example, ü might be one code point (Unicode: U+00FC, 
> UTF-8: C3 BC), or might be u with a combining umlaut.  I would assume 
> converting to a rune would turn out the same either way: then you can compare 
> them even if the haystack is represented one way in utf8 and the needle is 
> the other way.  (Disclaimer: I’m not a unicode expert, even less so on 9)
> 

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T8831073f8b8bb351-Mb71f0b6c34b98f89c7952434
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Re: [9fans] Why does utfutf() exist?

Reply via email to