Re: Unicode string literals

Bruno Haible Fri, 01 May 2020 02:07:25 -0700

Hi Paul,

> >> Could we have a macro to be used only in source code encoded via UTF-8?
> >> Presumably the older compilers would process the bytes of the string as if 
> >> they
> >> were individual 8-bit characters and would pass them through unchanged, so 
> >> the
> >> run-time string would be UTF-8 too.
> 
> > This would allow writing a macro that prefixes "u8" to strings in
> > compilers supporting enough of C11, skipping the prefix in compilers
> > that pass UTF-8 encoded bytes in strings unchanged
> 
> Yes, that was the idea.


Did you mean (1) that the programmer shall define a macro, that indicates that
their source code is UTF-8 encoded?

Or did you mean (2) that gnulib shall define a macro, that shall _assume_ that
the source code is UTF-8 encoded, and then expand to u8"xyz" instead of "xyz"?

Recall that the programmer is not usually telling GCC through command-line
options what the source encoding is. GCC has options -finput-charset and
-fexec-charset, but I have never seem them being used.

Also, UTF-8 is de-facto standard now: 99% of the web pages are in UTF-8,
and likely more than 95% of source code as well.

And on z/OS, users are not using GCC but the vendor compiler, which - as I
said - does not have compiler support that could reasonably be used.

For (1) to work, this macro would need to be defined in each source file,
after the #include statements - since the included headers files, possibly
from other packages, can be in a different source encoding. Few programmers
will want to do this.

For (2): what's the point? Once you assume that the source code is UTF-8
encoded, ISO C11 section 6.4.5 says that u8"xyz" and "xyz" are the same:
literals of type 'char *'.

Bruno

[1] https://gcc.gnu.org/onlinedocs/gcc-9.3.0/gcc/Preprocessor-Options.html

Re: Unicode string literals

Reply via email to