Re: multibyte characters in the Info reader

Gavin Smith Thu, 22 Jan 2026 08:20:53 -0800

On Thu, Jan 22, 2026 at 01:16:36PM +0100, Patrice Dumas wrote:
> It seems to me that the column number of multibyte characters, typically
> for ideograms is not taken into account.  I do not know if it is on
> purpose, nor if it is easy to do with the current code, but in case it
> could be useful, here is how it is done in texi2any with the help of
> libunistring, for a string already UTF-8 encoded:
> 
>  uint8_t *u8_text = (uint8_t *) text;
>  int width = u8_strwidth (u8_text, "UTF-8");
> 
>  u8_width could also be used after the number of bytes for an UTF-8
>  character have been collected.


Yes as I said, I don't think it is worth it:

> > Reading the UTF-8 sequence, obtaining the codepoint and calling wcwidth
> > seems to me to be a unnecessary complication for a marginal use case.

I don't see why we should add a libunistring dependency and more code to deal
with this case.

It is only a problem on particular terminals on MS-Windows (which is an OS
family of secondary importance for the GNU Project).

Short of rewriting the whole program to use libunistring instead of locale-
dependent functions, there will likely be broken behaviour in one place or
another, anyway.

The fix I posted seems to work well enough and allows users to read UTF-8
manuals under such condiitions.

Re: multibyte characters in the Info reader

Reply via email to