hubert.reinterpretcast added inline comments.
================ Comment at: clang/test/SemaCXX/static-assert.cpp:287 + static_assert((char16_t)L'ゆ' == L"C̵̭̯̠̎͌ͅť̺"[1], ""); // expected-error {{failed}} \ + // expected-note {{evaluates to ''ゆ' (0x3086) == '̵' (0x335)'}} + static_assert(L"\/"[1] == u'\xFFFD', ""); // expected-error {{failed}} \ ---------------- cor3ntin wrote: > hubert.reinterpretcast wrote: > > tahonermann wrote: > > > cor3ntin wrote: > > > > hubert.reinterpretcast wrote: > > > > > hubert.reinterpretcast wrote: > > > > > > cor3ntin wrote: > > > > > > > hubert.reinterpretcast wrote: > > > > > > > > The C++23 escaped string formatting facility would not generate > > > > > > > > a trailing combining character like this. I recommend following > > > > > > > > suit. > > > > > > > > > > > > > > > > Info on U+0335: > > > > > > > > https://util.unicode.org/UnicodeJsps/character.jsp?a=0335 > > > > > > > > > > > > > > > This is way outside the scope of the patch. The diagnostic output > > > > > > > facility has no understanding of combining characters or > > > > > > > graphemes and do not attempt to match std::print. It probably > > > > > > > would be an improvement but this patch is not trying to modify > > > > > > > how all diagnostics are printed. (all of that logic is in > > > > > > > Diagnostic.cpp) > > > > > > This patch is pushing the envelope of what appears in diagnostics. > > > > > > One can also argue that someone writing > > > > > > ``` > > > > > > static_assert(false, "\u0301"); > > > > > > ``` > > > > > > gets what they deserve, but that case does not have a big problem > > > > > > anyway (because the provided message text appears after `: `). > > > > > > > > > > > > This patch increases the exposure of the diagnostic output facility > > > > > > to input that it does not handle well. I disagree that it is > > > > > > outside the scope of this patch to insist that it does not generate > > > > > > such inputs to the diagnostic output facility (even if a possible > > > > > > solution is to modify the diagnostic output facility first). > > > > > @cor3ntin, do you have status quo examples for how grapheme-extending > > > > > characters that are not already "problematic" in their original > > > > > context are emitted in diagnostics in contexts where they are? > > > > Are you looking for that sort of examples? > > > > https://godbolt.org/z/c79xWr7Me > > > > That shows that clang has no understanding of graphemes > > > gcc and MSVC get that case "right" (probably by accident). > > > https://godbolt.org/z/Tjd6xnEon > > I was more looking for cases where the output grapheme includes elements > > that were part of the fixed message text (gobbling quotes, etc.). Also, > > this patch is in the wrong shape for handling this concern in the > > diagnostic printer because the delimiting of the replacement text happens > > in this patch. > Could you craft a message that becomes a graphene after substitution by the > engine? Maybe? You would have to try very hard and `static_assert` > diagnostics are not of the right shape. > > > This patch increases the exposure of the diagnostic output facility to > > input that it does not handle well. I disagree that it is outside the scope > > of this patch to insist that it does not generate such inputs to the > > diagnostic output facility (even if a possible solution is to modify the > > diagnostic output facility first). > > I still don't see it. None of the output produce in this patch are even close > to what i could be problematic. ie this patch is only ever producing ASCII or > single codepoints that gets escaped when they are not printable > Could you craft a message that becomes a graphene after substitution by the > engine? Maybe? You would have to try very hard and `static_assert` > diagnostics are not of the right shape. That is what I meant by this patch introducing new situations. > > This patch increases the exposure of the diagnostic output facility to > > input that it does not handle well. I disagree that it is outside the scope > > of this patch to insist that it does not generate such inputs to the > > diagnostic output facility (even if a possible solution is to modify the > > diagnostic output facility first). > > I still don't see it. None of the output produce in this patch are even close > to what i could be problematic. ie this patch is only ever producing ASCII or > single codepoints that gets escaped when they are not printable This patch produces single codepoints that are not escaped even when they may combine with a `'` delimiter. This patch also (currently) forms the string with the `'` and the potentially-combining character directly adjacent to each other. The least this patch should do is to emit the potentially-combining character to the diagnostic facility as a separate substitution (that is, if we can agree that the diagnostic facility should consider substitution boundaries as separate text elements; i.e., no graphemes should be formed partially by a substitution). Whether the diagnostic facility should use an escape sequence or a ZWNJ at the boundary can be a different discussion. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155610/new/ https://reviews.llvm.org/D155610 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits