On Thu, Jan 22, 2026 at 01:16:36PM +0100, Patrice Dumas wrote: > It seems to me that the column number of multibyte characters, typically > for ideograms is not taken into account. I do not know if it is on > purpose, nor if it is easy to do with the current code, but in case it > could be useful, here is how it is done in texi2any with the help of > libunistring, for a string already UTF-8 encoded: > > uint8_t *u8_text = (uint8_t *) text; > int width = u8_strwidth (u8_text, "UTF-8"); > > u8_width could also be used after the number of bytes for an UTF-8 > character have been collected.
Yes as I said, I don't think it is worth it: > > Reading the UTF-8 sequence, obtaining the codepoint and calling wcwidth > > seems to me to be a unnecessary complication for a marginal use case. I don't see why we should add a libunistring dependency and more code to deal with this case. It is only a problem on particular terminals on MS-Windows (which is an OS family of secondary importance for the GNU Project). Short of rewriting the whole program to use libunistring instead of locale- dependent functions, there will likely be broken behaviour in one place or another, anyway. The fix I posted seems to work well enough and allows users to read UTF-8 manuals under such condiitions.
