[Bug c++/102613] [C++23] P2246R1 - Character encoding of diagnostic text

cvs-commit at gcc dot gnu.org via Gcc-bugs Wed, 28 Jan 2026 01:26:47 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102613


--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <[email protected]>:

https://gcc.gnu.org/g:bb5ebc937329196eca404385b4388352ae568a86

commit r16-7104-gbb5ebc937329196eca404385b4388352ae568a86
Author: Jakub Jelinek <[email protected]>
Date:   Wed Jan 28 10:23:20 2026 +0100

    c++: Implement C++23 P2246R1 - Character encoding of diagnostic text

    The following patch attempts to implement the C++23 P2246R1
    Character encoding of diagnostic text paper.
    Initially I thought there is nothing to do, but this patch shows
    that there is (and I wonder if we shouldn't backport it to release
    branches).  Though the patch is on top of the cpp_translate_string
    libcpp addition from the reflection patchset (though, that is
    quite small change that could be backported too).

    We have various different encodings in play in GCC.
    There is -finput-charset= defaulting to SOURCE_CHARSET, which is
    almost always UTF-8 (but in theory could be UTF-EBCDIC if that really
    works).  libcpp converts source from the input charset to SOURCE_CHARSET
    initially.  And then we have -fexec-charset=, again defaulting to
    SOURCE_CHARSET, -fwide-exec-charset=, then UTF-8, UTF-16 and UTF-32
    for u8, u and U string literals and constants and finally user uses
    some character set in the terminal in which gcc is running.

    Now, I think we mostly just emit diagnostics in SOURCE_CHARSET,
    there is identifier_to_locale function which uses UCNs if LC_CTYPE
    CODESET is not UTF-8-ish, but I think we don't use it all the time.
    Even then, there is really no support for outputing from SOURCE_CHARSET
    UTF-8 to non-ASCII compatible terminal charsets.
    So for now let's pretend that we are emitting diagnostics to UTF-8
    capable terminal.

    When reporting errors about identifiers in the source (which are in
    SOURCE_CHARSET), we just emit those.  The paper talks about
    deprecated & nodiscard attribute msgs, static_assert, #error (and for
    C++26 it would talk about #warning, delete (reason) and static_assert
    with constexpr user messages).  #error/#warning works fine on UTF-8
    terminals, delete (reason) too (we don't translate the string literal
    from SOURCE_CHARSET to exec-charset in that case), static_assert
    with a string literal too (again, notranslate), __attribute__ form
    of deprecated attribute too (again, !parser->translate_strings_p).
    What doesn't work properly are C++11 attributes (standard or gnu::),
    we do translate those to exec charset, except for C++26
    standard deprecated/nodiscard (which aren't translated).  And static_assert
    with user messages doesn't work, those really have to be in exec-charset
    because we have no control on how user constructs the messages during
    constexpr evaluation.

    So, this patch for C++11 attributes if they have the first argument
    of a CPP_STRING temporarily disables translation of that string, which
    fixes [[gnu::deprecated ("foo")]], [[gnu::unavailable ("foo")]]
    and for C++ < 26 also [[deprecated ("foo")]] and [[nodiscard ("foo")]].
    And another change is convert back from exec-charset to SOURCE_CHARSET
    the custom user static_assert messages (and also inline asm strings).
    For diagnostics without this patch worst case we show garbage, but
    for inline asm we actually then fail to assemble stuff when users
    use the constexpr created string views with non-ASCII exec charsets.

    2026-01-28  Jakub Jelinek  <[email protected]>

            PR c++/102613
            * parser.cc: Implement C++23 P2246R1 - Character encoding of
            diagnostic text.
            (cp_parser_parenthesized_expression_list): For std attribute
            argument where the first argument is CPP_STRING, ensure the
            string is not translated.
            * semantics.cc: Include c-family/c-pragma.h.
            (cexpr_str::extract): Use cpp_translate_string to translate
            string from ordinary literal encoding to SOURCE_CHARSET.

            * g++.dg/cpp1z/constexpr-asm-6.C: New test.
            * g++.dg/cpp23/charset2.C: New test.
            * g++.dg/cpp23/charset3.C: New test.
            * g++.dg/cpp23/charset4.C: New test.
            * g++.dg/cpp23/charset5.C: New test.

[Bug c++/102613] [C++23] P2246R1 - Character encoding of diagnostic text

Reply via email to