> From: Bruno Haible <[email protected]> > Cc: [email protected], [email protected] > Date: Thu, 15 Jan 2026 22:33:58 +0100 > > > Is there a way to use (B) when the locale's codeset is UTF-8 and use > > (A) otherwise? > > It *is* possible with Gnulib if you use the different APIs: > - u8_mbtouc etc. for the case (U) above, > - mbrtowc (or better: mbrtoc32) etc. for the case (L) above. > > Upon first sight, this would mean that you would need to duplicate > the logic of the functions 'find_diff', 'display_process_line', > 'printed_representation', 'display_update_node_text' for the two cases. > > But that would be unmaintainable. > > To avoid such code duplication, I can think of two maintainable > approaches: > > * You could declare libiconv a mandatory dependency, e.g. like in > gettext/DEPENDENCIES: > [...] > This means that case (L) would not occur any more, and you could use > the libunistring API (u8_mbtouc etc.) throughout display.c.
Gavin, is this acceptable for you? If yes, it would be best, I think. (You could decide to defer such changes to the next release, of course.) > * Alternatively, you could create an extended copy of gnulib/lib/mbchar.h, > defining an abstract "multibyte character" that is UTF-8 encoded in case > (U) and locale encoded in case (L), i.e. depending on a global variable. > And then, an equally extended copy of gnulib/lib/mbiter.h, defining the > iterator over such multibyte characters. Yes. It's up to you as a Gnulib developer, but I tend to think that, as the Windows runtime doesn't support well (or not at all) characters beyond the BMP, replacing its standard C functions in Gnulib with versions that accept char32_t codepoints, and paying attention to the console's output codepage rather than the system locale's codeset, will allow GNU projects to have decent support for Unicode on Windows. So maybe in the long run Gnulib should add the above-mentioned extensions to its mbchar.h and mbiter.h. Thanks.
