https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102613
--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jakub Jelinek <[email protected]>: https://gcc.gnu.org/g:bb5ebc937329196eca404385b4388352ae568a86 commit r16-7104-gbb5ebc937329196eca404385b4388352ae568a86 Author: Jakub Jelinek <[email protected]> Date: Wed Jan 28 10:23:20 2026 +0100 c++: Implement C++23 P2246R1 - Character encoding of diagnostic text The following patch attempts to implement the C++23 P2246R1 Character encoding of diagnostic text paper. Initially I thought there is nothing to do, but this patch shows that there is (and I wonder if we shouldn't backport it to release branches). Though the patch is on top of the cpp_translate_string libcpp addition from the reflection patchset (though, that is quite small change that could be backported too). We have various different encodings in play in GCC. There is -finput-charset= defaulting to SOURCE_CHARSET, which is almost always UTF-8 (but in theory could be UTF-EBCDIC if that really works). libcpp converts source from the input charset to SOURCE_CHARSET initially. And then we have -fexec-charset=, again defaulting to SOURCE_CHARSET, -fwide-exec-charset=, then UTF-8, UTF-16 and UTF-32 for u8, u and U string literals and constants and finally user uses some character set in the terminal in which gcc is running. Now, I think we mostly just emit diagnostics in SOURCE_CHARSET, there is identifier_to_locale function which uses UCNs if LC_CTYPE CODESET is not UTF-8-ish, but I think we don't use it all the time. Even then, there is really no support for outputing from SOURCE_CHARSET UTF-8 to non-ASCII compatible terminal charsets. So for now let's pretend that we are emitting diagnostics to UTF-8 capable terminal. When reporting errors about identifiers in the source (which are in SOURCE_CHARSET), we just emit those. The paper talks about deprecated & nodiscard attribute msgs, static_assert, #error (and for C++26 it would talk about #warning, delete (reason) and static_assert with constexpr user messages). #error/#warning works fine on UTF-8 terminals, delete (reason) too (we don't translate the string literal from SOURCE_CHARSET to exec-charset in that case), static_assert with a string literal too (again, notranslate), __attribute__ form of deprecated attribute too (again, !parser->translate_strings_p). What doesn't work properly are C++11 attributes (standard or gnu::), we do translate those to exec charset, except for C++26 standard deprecated/nodiscard (which aren't translated). And static_assert with user messages doesn't work, those really have to be in exec-charset because we have no control on how user constructs the messages during constexpr evaluation. So, this patch for C++11 attributes if they have the first argument of a CPP_STRING temporarily disables translation of that string, which fixes [[gnu::deprecated ("foo")]], [[gnu::unavailable ("foo")]] and for C++ < 26 also [[deprecated ("foo")]] and [[nodiscard ("foo")]]. And another change is convert back from exec-charset to SOURCE_CHARSET the custom user static_assert messages (and also inline asm strings). For diagnostics without this patch worst case we show garbage, but for inline asm we actually then fail to assemble stuff when users use the constexpr created string views with non-ASCII exec charsets. 2026-01-28 Jakub Jelinek <[email protected]> PR c++/102613 * parser.cc: Implement C++23 P2246R1 - Character encoding of diagnostic text. (cp_parser_parenthesized_expression_list): For std attribute argument where the first argument is CPP_STRING, ensure the string is not translated. * semantics.cc: Include c-family/c-pragma.h. (cexpr_str::extract): Use cpp_translate_string to translate string from ordinary literal encoding to SOURCE_CHARSET. * g++.dg/cpp1z/constexpr-asm-6.C: New test. * g++.dg/cpp23/charset2.C: New test. * g++.dg/cpp23/charset3.C: New test. * g++.dg/cpp23/charset4.C: New test. * g++.dg/cpp23/charset5.C: New test.
