Re: Unicode string literals

Paul Eggert Fri, 01 May 2020 14:23:11 -0700

On 5/1/20 2:01 AM, Bruno Haible wrote:

> Did you mean (1) that the programmer shall define a macro, that indicates that
> their source code is UTF-8 encoded?
> 
> Or did you mean (2) that gnulib shall define a macro, that shall _assume_ that
> the source code is UTF-8 encoded, and then expand to u8"xyz" instead of "xyz"?


Yes, I meant (2).

> For (2): what's the point? Once you assume that the source code is UTF-8
> encoded, ISO C11 section 6.4.5 says that u8"xyz" and "xyz" are the same:
> literals of type 'char *'.

I was thinking about the case where one develops and normally builds on systems
that assume UTF-8 source code (perhaps because a build system is old and just
compiles the bytes unchecked), but that on occasion a builder might translate
all the source code to (say) EUC-JP for whatever reason, and then compile on a
newer platform that supports the u8 prefix.

Admittedly the scenario is unlikely. I suppose we should wait until a real need
arises before worrying about it.

This all reminds me of trigraphs somehow
<https://en.wikipedia.org/wiki/Digraphs_and_trigraphs>. What a pain that was,
and still is.

Re: Unicode string literals

Reply via email to