[CCing bug-gnulib]

Collin Funk wrote:
> On Alpine this is because iswblank in UTF-8 locales is the same as
> isblank in the C locale. This means it only returns 1 for U+0009 TAB and
> U+0020 SPACE.

It is allowed to behave this way.
https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/iswblank.html
https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/basedefs/V1_chap07.html

> I am thinking of fixing this in Gnulib since I think the proper behavior
> of iswblank in a UTF-8 locale is to return 1 for U+2002 EN SPACE and
> U+2003 EM SPACE.

While I agree that it looks better to us if iswblank returns true for EN SPACE
and EM SPACE, I don't think it's such a severe bug that Gnulib should work 
around
it. I see it more as a quality-of-implementation issue. The authors of other 
libcs
are entitled to have different viewpoints than we have.

Otherwise, where would be stop? Should we completely override a libc's character
classifications with tables from gnulib or from glibc? For most purposes, this
would be overkill.

You can decide to make 'fold' behave as you expect, for instance by using
the function uc_is_blank (from <unictype.h>) instead of iswblank. (Note that
uc_is_blank takes a char32_t as argument, though, not a wchar_t or wint_t.)

Bruno






Reply via email to