On Fri, 15 Mar 2019 12:52:04 +0100 Hiltjo Posthuma <[email protected]> wrote:
Dear Hiltjo, > I've applied both of the patches and a small change to the default > worddelimiters. > > Thanks for the clarifications. The codepoint assumption was indeed > wrong. > > I do not mind wchar_t, but in practise it is not consistent across > platforms. However we already use wchar_t in st so it should be as > correct as possible matching the POSIX standard. > > (@Laslo) for simplicity/sanity sake I think assuming 1 codepoint is 1 > "character" makes sense. yeah, this should work. As you know, I sometimes get carried away with these things and the wrong assumption that 1 codepoint is 1 character is still the common approach across the board with very few exceptions. It is highly unlikely that one chooses รถ as a delimiter character and then adds it as o + umlaut modifier to the delimiter string. In this context, in my opinion, you made the right call, but in the long run, if we start a utf+unicode library project, it should be done such that these matters are reflected properly. With best regards Laslo -- Laslo Hunhold <[email protected]>
pgp4i1BIdYNiR.pgp
Description: PGP signature
