Hello Ingo, On Tue, Aug 19, 2025 at 05:39:13PM +0200, Ingo Schwarze wrote: > Hi Walter, > > Walter Alejandro Iglesias wrote on Mon, Aug 18, 2025 at 06:40:04PM +0200: > > > Question for the experts. Let's take the following example: > > > > ----->8------------->8-------------------- > > #include <stdio.h> > > #include <string.h> > > #include <wchar.h> > > > > #define period 0x2e > > #define question 0x3f > > #define exclam 0x21 > > #define ellipsis L'\u2026' > > > > const wchar_t p[] = { period, question, exclam, ellipsis }; > > In addition to what otto@ said, this is bad style for more than one > reason. > > First of all, that data type of the constant "0x2e" is "int", > see for example C11 6.4.4.1 (Integer constants). Casting "int" > to "wchar_t" doesn't really make sense. On OpenBSD, it only > works because UTF-8 is the only supported character encoding *and* > wchar_t stores Unicode codepoints. But neither of these choices > are portable. What you want is (C11 6.4.4.4 Character constants): > > #define period L'.' > #define question L'?' > #define exclam L'!'
As I explain below I did that in a program I wrote to work with UTF-8 only. But I'll follow your advice and adopt this practice from now on. > > > int > > main() > > { > > const wchar_t s[] = L". Hello."; > > > > printf("%ls\n", s); > > printf("%lu\n", wcsspn(s, p)); > > The return value of wcsspn(3) is size_t, so this should use %zu. Yeah, the compiler warned me about this. I wrote the example carelessly. > > Besides, given that the second argument of wcsspn(3) > takes "const wchar_t *", why not simply: > > const wchar_t *p = L".?!\u2026"; I'd tried this: const wchar_t p[] = L".?!\u2026"; and saw that it solved the problem, *but I didn't undesrtand why*. My mistake was assuming that since this syntax didn't require specifying the length in the brakets, neither did the one I used. By the way, the program where I experienced the failures is this: https://en.roquesor.com/Downloads/fmtroff.c As you can see in the code, my intention was to define all the characters in a legible, clear, and practical way but, after encountering this problem, I seriously wondered if I'd made my life complicated by writing it like this. > > And finally, if you want wchar_t to store UTF-8 strings, you need > something like > > #include <err.h> > #include <locale.h> > > if (setlocale(LC_CTYPE, "C.UTF-8") == NULL) > errx(1, "setlocale failed"); > > Otherwise, the C library function operating on wide strings > assume that wchar_t only stores ASCII character numbers. > Even printf(3) %ls won't work for UTF-8 characters without > setting the locale properly. Yes, it was an oversight on my part not to include setlocale() in the example. By the way, If you take a look to fmtroff.c you'll see this line: setlocale(LC_CTYPE, ""); My intention with fmtroff was to have it work only with UTF-8, so first I'd used the UTF-8 specification in setlocale() as in your example. Later I decided to leave that field empty because, after testing under Linux, I found that with other locales, except for that it doesn't take advantage of UTF-8 hardcoded punctuation, the program also does its job. As it happens with wide character functions, the problem comes when, under UTF-8 locale, you edit a file containing non valid UTF-8 characters. My previous version of the program was written without wide-char functions and, as fmt(1) from base, it hasn't this problem. Each version has its pro an cons. I use it as a more suitable version of fmt(1) to edit my novels in Spanish with Groff. > > Yours, > Ingo > -- Walter