https://gcc.gnu.org/g:bb5ebc937329196eca404385b4388352ae568a86

commit r16-7104-gbb5ebc937329196eca404385b4388352ae568a86
Author: Jakub Jelinek <[email protected]>
Date:   Wed Jan 28 10:23:20 2026 +0100

    c++: Implement C++23 P2246R1 - Character encoding of diagnostic text
    
    The following patch attempts to implement the C++23 P2246R1
    Character encoding of diagnostic text paper.
    Initially I thought there is nothing to do, but this patch shows
    that there is (and I wonder if we shouldn't backport it to release
    branches).  Though the patch is on top of the cpp_translate_string
    libcpp addition from the reflection patchset (though, that is
    quite small change that could be backported too).
    
    We have various different encodings in play in GCC.
    There is -finput-charset= defaulting to SOURCE_CHARSET, which is
    almost always UTF-8 (but in theory could be UTF-EBCDIC if that really
    works).  libcpp converts source from the input charset to SOURCE_CHARSET
    initially.  And then we have -fexec-charset=, again defaulting to
    SOURCE_CHARSET, -fwide-exec-charset=, then UTF-8, UTF-16 and UTF-32
    for u8, u and U string literals and constants and finally user uses
    some character set in the terminal in which gcc is running.
    
    Now, I think we mostly just emit diagnostics in SOURCE_CHARSET,
    there is identifier_to_locale function which uses UCNs if LC_CTYPE
    CODESET is not UTF-8-ish, but I think we don't use it all the time.
    Even then, there is really no support for outputing from SOURCE_CHARSET
    UTF-8 to non-ASCII compatible terminal charsets.
    So for now let's pretend that we are emitting diagnostics to UTF-8
    capable terminal.
    
    When reporting errors about identifiers in the source (which are in
    SOURCE_CHARSET), we just emit those.  The paper talks about
    deprecated & nodiscard attribute msgs, static_assert, #error (and for
    C++26 it would talk about #warning, delete (reason) and static_assert
    with constexpr user messages).  #error/#warning works fine on UTF-8
    terminals, delete (reason) too (we don't translate the string literal
    from SOURCE_CHARSET to exec-charset in that case), static_assert
    with a string literal too (again, notranslate), __attribute__ form
    of deprecated attribute too (again, !parser->translate_strings_p).
    What doesn't work properly are C++11 attributes (standard or gnu::),
    we do translate those to exec charset, except for C++26
    standard deprecated/nodiscard (which aren't translated).  And static_assert
    with user messages doesn't work, those really have to be in exec-charset
    because we have no control on how user constructs the messages during
    constexpr evaluation.
    
    So, this patch for C++11 attributes if they have the first argument
    of a CPP_STRING temporarily disables translation of that string, which
    fixes [[gnu::deprecated ("foo")]], [[gnu::unavailable ("foo")]]
    and for C++ < 26 also [[deprecated ("foo")]] and [[nodiscard ("foo")]].
    And another change is convert back from exec-charset to SOURCE_CHARSET
    the custom user static_assert messages (and also inline asm strings).
    For diagnostics without this patch worst case we show garbage, but
    for inline asm we actually then fail to assemble stuff when users
    use the constexpr created string views with non-ASCII exec charsets.
    
    2026-01-28  Jakub Jelinek  <[email protected]>
    
            PR c++/102613
            * parser.cc: Implement C++23 P2246R1 - Character encoding of
            diagnostic text.
            (cp_parser_parenthesized_expression_list): For std attribute
            argument where the first argument is CPP_STRING, ensure the
            string is not translated.
            * semantics.cc: Include c-family/c-pragma.h.
            (cexpr_str::extract): Use cpp_translate_string to translate
            string from ordinary literal encoding to SOURCE_CHARSET.
    
            * g++.dg/cpp1z/constexpr-asm-6.C: New test.
            * g++.dg/cpp23/charset2.C: New test.
            * g++.dg/cpp23/charset3.C: New test.
            * g++.dg/cpp23/charset4.C: New test.
            * g++.dg/cpp23/charset5.C: New test.

Diff:
---
 gcc/cp/parser.cc                             | 11 +++++++++
 gcc/cp/semantics.cc                          | 19 +++++++++++++++
 gcc/testsuite/g++.dg/cpp1z/constexpr-asm-6.C | 34 ++++++++++++++++++++++++++
 gcc/testsuite/g++.dg/cpp23/charset2.C        | 36 ++++++++++++++++++++++++++++
 gcc/testsuite/g++.dg/cpp23/charset3.C        | 24 +++++++++++++++++++
 gcc/testsuite/g++.dg/cpp23/charset4.C        | 36 ++++++++++++++++++++++++++++
 gcc/testsuite/g++.dg/cpp23/charset5.C        | 24 +++++++++++++++++++
 7 files changed, 184 insertions(+)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 6e310f2c0fde..2891856098c0 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -9868,6 +9868,17 @@ cp_parser_parenthesized_expression_list (cp_parser* 
parser,
                expression_list->quick_push (arg);
            goto get_comma;
          }
+       else if (is_attribute_list == normal_attr
+                && cp_lexer_next_token_is (parser->lexer, CPP_STRING)
+                && (cp_lexer_nth_token_is (parser->lexer, 2, CPP_COMMA)
+                    || cp_lexer_nth_token_is (parser->lexer, 2, 
CPP_CLOSE_PAREN)))
+         {
+           auto t = make_temp_override (parser->translate_strings_p, false);
+           expr
+             = cp_parser_parenthesized_expression_list_elt (parser, cast_p,
+                                                            allow_expansion_p,
+                                                            non_constant_p);
+         }
        else
          expr
            = cp_parser_parenthesized_expression_list_elt (parser, cast_p,
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 35bc48e49dc0..3e1a86fae6ca 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "memmodel.h"
 #include "gimplify.h"
 #include "contracts.h"
+#include "c-family/c-pragma.h"
 
 /* There routines provide a modular interface to perform many parsing
    operations.  They may therefore be used during actual parsing, or
@@ -12855,6 +12856,24 @@ cexpr_str::extract (location_t location, const char * 
& msg, int &len)
              return false;
            }
        }
+      /* Convert the string from execution charset to SOURCE_CHARSET.  */
+      cpp_string istr, ostr;
+      istr.len = len;
+      istr.text = (const unsigned char *) msg;
+      if (!cpp_translate_string (parse_in, &istr, &ostr, CPP_STRING, true))
+       {
+         error_at (location, "could not convert constexpr string from "
+                             "ordinary literal encoding to source character "
+                             "set");
+         return false;
+       }
+      else
+       {
+         if (buf)
+           XDELETEVEC (buf);
+         msg = buf = const_cast <char *> ((const char *) ostr.text);
+         len = ostr.len;
+       }
     }
   else
     {
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-6.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-6.C
new file mode 100644
index 000000000000..ca435f5ed452
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-asm-6.C
@@ -0,0 +1,34 @@
+/* { dg-do compile { target c++17 } } */
+/* { dg-skip-if "requires hosted libstdc++ for string" { ! hostedlib } } */
+// { dg-require-iconv "IBM1047" }
+// { dg-options "-fexec-charset=IBM1047" }
+
+#include <string>
+
+constexpr std::string_view genfoo ()
+{
+  return "foo %1,%0";
+}
+
+constexpr std::string_view genoutput ()
+{
+  return "=r";
+}
+
+constexpr std::string_view geninput ()
+{
+  return "r";
+}
+
+constexpr std::string_view genclobber ()
+{
+  return "memory";
+}
+
+void f()
+{
+  int a;
+  asm((genfoo ()) : (genoutput ()) (a) : (geninput ()) (1) : (genclobber ()));
+}
+
+/* { dg-final { scan-assembler "foo" } } */
diff --git a/gcc/testsuite/g++.dg/cpp23/charset2.C 
b/gcc/testsuite/g++.dg/cpp23/charset2.C
new file mode 100644
index 000000000000..8230b442b17c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/charset2.C
@@ -0,0 +1,36 @@
+// P2246R1
+// { dg-do compile { target c++23 } }
+// { dg-require-iconv "IBM1047" }
+// { dg-options "-pedantic-errors -fexec-charset=IBM1047" }
+
+[[deprecated ("foo")]] int d;          // { dg-message "declared here" }
+int e = d;                             // { dg-warning "'d' is deprecated: 
foo" }
+static_assert (false, "bar");          // { dg-error "static assertion failed: 
bar" }
+#error "baz"                           // { dg-error "#error \"baz\"" }
+[[nodiscard ("qux")]] int foo ();      // { dg-message "declared here" }
+void
+bar ()
+{
+  foo ();                              // { dg-warning "ignoring return value 
of 'int foo\\\(\\\)', declared with attribute 'nodiscard': 'qux'" }
+}
+#if __cplusplus > 202302L
+#warning "fred"                                // { dg-warning "#warning 
\"fred\"" "" { target c++26 } }
+#endif
+#if __cpp_static_assert >= 202306L
+struct A { constexpr int size () const { return 5; }
+           constexpr const char *data () const { return "xyzzy"; } };
+static_assert (false, A {});           // { dg-error "static assertion failed: 
xyzzy" "" { target c++26 } }
+#endif
+#if __cpp_deleted_function >= 202403L
+int baz () = delete ("garply");                // { dg-message "declared here" 
"" { target c++26 } }
+void
+plugh ()
+{
+  baz ();                              // { dg-error "use of deleted function 
'int baz\\\(\\\)': garply" "" { target c++26 } }
+}
+#endif
+namespace [[deprecated ("corge")]] ND  // { dg-message "declared here" }
+{
+  int i;
+};
+int j = ND::i;                         // { dg-warning "'ND' is deprecated: 
corge" }
diff --git a/gcc/testsuite/g++.dg/cpp23/charset3.C 
b/gcc/testsuite/g++.dg/cpp23/charset3.C
new file mode 100644
index 000000000000..fd9e1585d90e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/charset3.C
@@ -0,0 +1,24 @@
+// P2246R1
+// { dg-do compile { target c++11 } }
+// { dg-require-iconv "IBM1047" }
+// { dg-options "-fexec-charset=IBM1047" }
+
+[[gnu::deprecated ("foo")]] int d;     // { dg-message "declared here" }
+int e = d;                             // { dg-warning "'d' is deprecated: 
foo" }
+[[gnu::unavailable ("bar")]] int f;    // { dg-message "declared here" }
+int g = f;                             // { dg-error "'f' is unavailable: bar" 
}
+__attribute__((deprecated ("baz"))) int h; // { dg-message "declared here" }
+int i = h;                             // { dg-warning "'h' is deprecated: 
baz" }
+__attribute__((unavailable ("qux"))) int j;    // { dg-message "declared here" 
}
+int k = j;                             // { dg-error "'j' is unavailable: qux" 
}
+#warning "fred"                                // { dg-warning "#warning 
\"fred\"" }
+namespace [[gnu::deprecated ("corge")]] ND // { dg-message "declared here" }
+{
+  int l;
+};
+int m = ND::l;                         // { dg-warning "'ND' is deprecated: 
corge" }
+namespace __attribute__((deprecated ("xyzzy"))) NE // { dg-message "declared 
here" }
+{
+  int l;
+};
+int n = NE::l;                         // { dg-warning "'NE' is deprecated: 
xyzzy" }
diff --git a/gcc/testsuite/g++.dg/cpp23/charset4.C 
b/gcc/testsuite/g++.dg/cpp23/charset4.C
new file mode 100644
index 000000000000..e79188225b48
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/charset4.C
@@ -0,0 +1,36 @@
+// P2246R1
+// { dg-do compile { target c++23 } }
+// { dg-require-iconv "UTF-8" }
+// { dg-options "-pedantic-errors -fexec-charset=UTF-8" }
+
+[[deprecated ("áæ)")]] int d;          // { dg-message "declared here" }
+int e = d;                             // { dg-warning "'d' is deprecated: áæ" 
}
+static_assert (false, "áæ");           // { dg-error "static assertion failed: 
áæ" }
+#error "áæ"                            // { dg-error "#error \"áæ\"" }
+[[nodiscard ("áæ")]] int foo ();       // { dg-message "declared here" }
+void
+bar ()
+{
+  foo ();                              // { dg-warning "ignoring return value 
of 'int foo\\\(\\\)', declared with attribute 'nodiscard': 'áæ'" }
+}
+#if __cplusplus > 202302L
+#warning "áæ"                          // { dg-warning "#warning \"áæ\"" "" { 
target c++26 } }
+#endif
+#if __cpp_static_assert >= 202306L
+struct A { constexpr int size () const { return sizeof ("áæ") - 1; }
+           constexpr const char *data () const { return "áæ"; } };
+static_assert (false, A {});           // { dg-error "static assertion failed: 
áæ" "" { target c++26 } }
+#endif
+#if __cpp_deleted_function >= 202403L
+int baz () = delete ("áæ");            // { dg-message "declared here" "" { 
target c++26 } }
+void
+plugh ()
+{
+  baz ();                              // { dg-error "use of deleted function 
'int baz\\\(\\\)': áæ" "" { target c++26 } }
+}
+#endif
+namespace [[deprecated ("áæ")]] ND     // { dg-message "declared here" }
+{
+  int i;
+};
+int j = ND::i;                         // { dg-warning "'ND' is deprecated: 
áæ" }
diff --git a/gcc/testsuite/g++.dg/cpp23/charset5.C 
b/gcc/testsuite/g++.dg/cpp23/charset5.C
new file mode 100644
index 000000000000..06766937142a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp23/charset5.C
@@ -0,0 +1,24 @@
+// P2246R1
+// { dg-do compile { target c++11 } }
+// { dg-require-iconv "UTF-8" }
+// { dg-options "-fexec-charset=UTF-8" }
+
+[[gnu::deprecated ("áæ")]] int d;      // { dg-message "declared here" }
+int e = d;                             // { dg-warning "'d' is deprecated: áæ" 
}
+[[gnu::unavailable ("áæ")]] int f;     // { dg-message "declared here" }
+int g = f;                             // { dg-error "'f' is unavailable: áæ" }
+__attribute__((deprecated ("áæ"))) int h; // { dg-message "declared here" }
+int i = h;                             // { dg-warning "'h' is deprecated: áæ" 
}
+__attribute__((unavailable ("áæ"))) int j;     // { dg-message "declared here" 
}
+int k = j;                             // { dg-error "'j' is unavailable: áæ" }
+#warning "áæ"                          // { dg-warning "#warning \"áæ\"" }
+namespace [[gnu::deprecated ("áæ")]] ND // { dg-message "declared here" }
+{
+  int l;
+};
+int m = ND::l;                         // { dg-warning "'ND' is deprecated: 
áæ" }
+namespace __attribute__((deprecated ("áæ"))) NE // { dg-message "declared 
here" }
+{
+  int l;
+};
+int n = NE::l;                         // { dg-warning "'NE' is deprecated: 
áæ" }

Reply via email to