[CCing bug-gnulib] Collin Funk wrote: > On Alpine this is because iswblank in UTF-8 locales is the same as > isblank in the C locale. This means it only returns 1 for U+0009 TAB and > U+0020 SPACE.
It is allowed to behave this way. https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/iswblank.html https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/basedefs/V1_chap07.html > I am thinking of fixing this in Gnulib since I think the proper behavior > of iswblank in a UTF-8 locale is to return 1 for U+2002 EN SPACE and > U+2003 EM SPACE. While I agree that it looks better to us if iswblank returns true for EN SPACE and EM SPACE, I don't think it's such a severe bug that Gnulib should work around it. I see it more as a quality-of-implementation issue. The authors of other libcs are entitled to have different viewpoints than we have. Otherwise, where would be stop? Should we completely override a libc's character classifications with tables from gnulib or from glibc? For most purposes, this would be overkill. You can decide to make 'fold' behave as you expect, for instance by using the function uc_is_blank (from <unictype.h>) instead of iswblank. (Note that uc_is_blank takes a char32_t as argument, though, not a wchar_t or wint_t.) Bruno