On Wed, Aug 20, 2025 at 04:33:47PM +0200, Ingo Schwarze wrote: > Hell Walter, > > Walter Alejandro Iglesias wrote on Wed, Aug 20, 2025 at 09:18:52AM +0200: > > On Tue, Aug 19, 2025 at 05:39:13PM +0200, Ingo Schwarze wrote: > >> Walter Alejandro Iglesias wrote on Mon, Aug 18, 2025 at 06:40:04PM +0200: > > >>> #define period 0x2e > >>> #define question 0x3f > >>> #define exclam 0x21 > >>> #define ellipsis L'\u2026' > >>> const wchar_t p[] = { period, question, exclam, ellipsis }; > > >> In addition to what otto@ said, this is bad style for more than one > >> reason. > >> > >> First of all, that data type of the constant "0x2e" is "int", > >> see for example C11 6.4.4.1 (Integer constants). Casting "int" > >> to "wchar_t" doesn't really make sense. On OpenBSD, it only > >> works because UTF-8 is the only supported character encoding *and* > >> wchar_t stores Unicode codepoints. But neither of these choices > >> are portable. What you want is (C11 6.4.4.4 Character constants): > >> > >> #define period L'.' > >> #define question L'?' > >> #define exclam L'!' > > > As I made this change to my code (https://en.roquesor.com/fmtroff.html) > > the following reminded me why, at some point, I decided to switch to > > hexadecimal notation. > > > > #define backslash L'\\' > > #define apostrophe L'\'' > > > > It isn't very confusing there, but among the arguments of a function or > > a conditional... > > Making code look nice is nice to have and can even make code more > readable and hence reduce the likelihood of bugs. But even if you > are coding with narrow strings for ASCII only, whether > > char mychar = 0x5c; > char mychar = 92; > char mychar = 0134; > > is more readable than > > char mychar = '\\'; > > is debateable; at least i would find reading the latter easier than > the former, even in a conditional or function call argument.
If it weren't because I don't like using UTF-8 characters in the code (I use vi(1) from base to code), I would write the characters themselves directly, both narrow and wide. That's undoubtely the most human readable option. :-) > > For narrow characters, the portability argument is weak; writing > code that is portable to EBCDIC machines is the kind of excessive > portability that provokes bugs rather than prevent them. But still, > i'd recommend against specifying narrow characters numerically. > Even mandoc_char(7) says: > > NUMBERED CHARACTERS > For backward compatibility with existing manuals, mandoc(1) > also supports the > \N'number' and \[charnumber] > escape sequences, inserting the character number from the > current character set into the output. Of course, this is > inherently non-portable and is already marked as deprecated > in the Heirloom roff manual; on top of that, the second form > is a GNU extension. For example, do not use \N'34' or > \[char34], use \(dq, or even the plain `"' character where > possible. In my Groff files, for Spanish, except for a definition I added to my macros for the UTF-8 ellipsis (out of the reach of preconv(1)), I write all UTF-8 characters as is. > > A similar recommendation makes sense for C code. > > What *is* portable is specifying wide characters by Unicode > codepoint numbers, for example: > > wchar_t mywide = L'\u2026'; /* horizontal ellipsis */ > > But note that the C standard (C11 6.4.3.2 Universal character names) > explicitly requires the argument to \u to be at least 00A0, > with only three exceptions: > > L'\u0024' == L'$' > L'\u0040' == L'@' > L'\u0060' == L'`' > > Being so specific is a weird quirk of the standard, but it means > you should better not abuse \u to obfuscate ASCII codepoints - > apart from being very ugly, it may not even work. For example, > current base clang dies like this: > > error: character 'A' cannot be specified by a universal character name > 13 | wchar_t mywide = L'\u0041'; > 1 error generated. > > So there is no real alternative to L'\\'. While L'\x5c' and L'\134' > work for UTF-8 (and hence on OpenBSD), even that is not guaranteed > to be portable, and what those two produce may depend both on the > implementation and on the locale. I already changed all my ASCII character definitions to the notation you advice and left the UTF-8 ones with the L'\u????' code: https://en.roquesor.com/Downloads/fmtroff.c Here I mention your help: https://en.roquesor.com/fmtroff.html Andando y aprendiendo. :-) > > Yours, > Ingo > -- Walter