(Reply accidentally not sent to list.) ----- Forwarded message from Gavin Smith <[email protected]> -----
Date: Thu, 9 Nov 2023 20:11:59 +0000 From: Gavin Smith <[email protected]> To: Bruno Haible <[email protected]> Subject: Locale-independent paragraph formatting [was Re: Texinfo 7.0.93 pretest available] On Tue, Oct 10, 2023 at 07:29:15PM +0200, Bruno Haible wrote: > Given that the only encoding you want to deal with is UTF-8, Eli's suggestion > to use GNU libunistring is better than my iconv() suggestion. It has functions > for width determination: > https://www.gnu.org/software/libunistring/manual/html_node/uniwidth_002eh.html > > > but I doubt it is urgent to do before the release, as the current approach, > > however flawed, has been in place and worked fairly well for a long time > > (since the XS paragraph module was written). > > Well, it does not work on Windows. > > I agree with you that it's not urgent to do before the 7.1 release, since > the Windows port is work-in-progress. I have just pushed a commit (e3a28cc9bf) to use gnulib/libunistring functions instead of the locale-dependent functions mbrtowc and wcwidth. This allows for a significant simplification as we do not have to try to switch to a UTF-8 encoded locale. I was not sure about how to put a char32_t literal in the source code. For example, where we previously had L'a' as a literal wchar_t letter 'a', I changed this to U'a'. I could not find very much information about this online or whether this would be widely supported by C compilers. The U prefix for char32_t is mentioned in a copy of the C11 standard I found online and also in a C23 draft. Section 6.4.4.4 "Character constants" (page 67): A wide character constant prefixed by the letter L has type wchar_t, an integer type defined in the <stddef.h> header; a wide character constant prefixed by the letter u or U has type char16_t or char32_t, respectively, unsigned integer types defined in the <uchar.h> header. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf The "uchar" or "uchar-c23" modules are supposed to provide uchar.h on platforms where it doesn't exist, but I highly doubt there is any mechanism for providing a new character literal syntax if not supported by the compiler. Does anybody know if we could just write 'a' instead of U'a' and rely on it being converted? E.g. if you do char32_t c = 'a'; then afterwards, c should be equal to 97 (ASCII value of 'a'). ----- End forwarded message -----
