Hi Paul, > >> Could we have a macro to be used only in source code encoded via UTF-8? > >> Presumably the older compilers would process the bytes of the string as if > >> they > >> were individual 8-bit characters and would pass them through unchanged, so > >> the > >> run-time string would be UTF-8 too. > > > This would allow writing a macro that prefixes "u8" to strings in > > compilers supporting enough of C11, skipping the prefix in compilers > > that pass UTF-8 encoded bytes in strings unchanged > > Yes, that was the idea.
Did you mean (1) that the programmer shall define a macro, that indicates that their source code is UTF-8 encoded? Or did you mean (2) that gnulib shall define a macro, that shall _assume_ that the source code is UTF-8 encoded, and then expand to u8"xyz" instead of "xyz"? Recall that the programmer is not usually telling GCC through command-line options what the source encoding is. GCC has options -finput-charset and -fexec-charset, but I have never seem them being used. Also, UTF-8 is de-facto standard now: 99% of the web pages are in UTF-8, and likely more than 95% of source code as well. And on z/OS, users are not using GCC but the vendor compiler, which - as I said - does not have compiler support that could reasonably be used. For (1) to work, this macro would need to be defined in each source file, after the #include statements - since the included headers files, possibly from other packages, can be in a different source encoding. Few programmers will want to do this. For (2): what's the point? Once you assume that the source code is UTF-8 encoded, ISO C11 section 6.4.5 says that u8"xyz" and "xyz" are the same: literals of type 'char *'. Bruno [1] https://gcc.gnu.org/onlinedocs/gcc-9.3.0/gcc/Preprocessor-Options.html