Hi, On Wed, Mar 13 2019 20:35:09 +0100, Hiltjo Posthuma wrote: > I don't like mixing of the existing functions with wchar_t. > I think st should (at the very least internally) use utf-8.
I think I explained my position poorly, so let me try to clarify. My apologies if this seems a bit pushy :) First - I agree with using UTF-8. That's actually how I ended up with this diff -- I was trying to configure U+3000 IDEOGRAPHIC SPACE as a delimiter, but seeing that worddelimiters was char *, I started wondering whether I could actually use unicode characters in it and had to go read the code, thus finding utf8strchr(). utf8strchr() is a bit peculiar - on every call to ISDELIM(), it decodes the worddelimiters utf-8 string into Runes (so that it can compare to the Rune argument). It seems a little strange to me to be doing that -- the delimiters string cannot change at runtime, so storing the codepoints instead of the multibyte string feels like a better fit. And that's what wchar_t * is, with the added bonus that we can use libc wcschr() instead of rolling our own search function. I already mentioned that Rune is being passed to wcwidth(wchar_t), so it seems like there is a builtin assumption that Rune and wchar_t hold equivalent values. I actually don't understand why that typedef exists instead of just using wchar_t; maybe I'm missing something. Could you explain what it is that you don't like about wchar_t? -- Lauri Tirkkonen | lotheac @ IRCnet
