[CCing bug-gnulib]
Gavin Smith wrote:
> > I guess you will need to look at the Unicode characters that you pass to 
> > c32width,
> > and whether you get return values < 1 for some of them.
> 
> It is locale-dependent!
> 
> It looks like c32width is simply being redirected to wcwidth which then
> doesn't work properly with LC_ALL=C.  This is from the gnulib module
> c32width.
> 
> I don't know if there is an easy way to make a self-contained example
> to show the difference, because it needs all the gnulib Makefile machinery,
> but the difference shows up for any non-ASCII character.  If I add a line
> like
> 
>  fprintf (stderr, "width of [%4.0lx] is %d (remaining %s)\n",
>                     (long) wc, width, q);
> 
> in the right place in the code, where width is the result of c32width,
> then the output looks like
> 
> width of [  40] is 1 (remaining @)
> width of [  4f] is 1 (remaining OE )
> width of [  45] is 1 (remaining E )
> width of [ 152] is -1 (remaining Œ)
> width of [  28] is 1 (remaining (Œ)
> 
> for LC_ALL=C, but
> 
> width of [  40] is 1 (remaining @)
> width of [  4f] is 1 (remaining OE )
> width of [  45] is 1 (remaining E )
> width of [ 152] is 1 (remaining Œ)
> width of [  28] is 1 (remaining (Œ)
> 
> otherwise (LC_ALL=en_GB.UTF-8).

Indeed, the c32* functions by design work only on those Unicode characters
that can be represented as multibyte sequences in the current locale.

I'll document this better in the Gnulib manual.

Since you want texinfo to work on UTF-8 encoded text with characters outside
the repertoire of the current locale, you'll need the libunistring functions,
documented in
<https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html>.
Namely, replace c32width with uc_width.

Bruno




Reply via email to