hubert.reinterpretcast added inline comments.

================
Comment at: clang/test/SemaCXX/static-assert.cpp:287
+  static_assert((char16_t)L'ゆ' == L"C̵̭̯̠̎͌ͅť̺"[1], ""); // expected-error 
{{failed}} \
+                                                  // expected-note {{evaluates 
to ''ゆ' (0x3086) == '̵' (0x335)'}}
+  static_assert(L"\/"[1] == u'\xFFFD', ""); // expected-error {{failed}} \
----------------
cor3ntin wrote:
> hubert.reinterpretcast wrote:
> > tahonermann wrote:
> > > cor3ntin wrote:
> > > > hubert.reinterpretcast wrote:
> > > > > hubert.reinterpretcast wrote:
> > > > > > cor3ntin wrote:
> > > > > > > hubert.reinterpretcast wrote:
> > > > > > > > The C++23 escaped string formatting facility would not generate 
> > > > > > > > a trailing combining character like this. I recommend following 
> > > > > > > > suit.
> > > > > > > > 
> > > > > > > > Info on U+0335: 
> > > > > > > > https://util.unicode.org/UnicodeJsps/character.jsp?a=0335
> > > > > > > > 
> > > > > > > This is way outside the scope of the patch. The diagnostic output 
> > > > > > > facility has no understanding of combining characters or 
> > > > > > > graphemes and do not attempt to match std::print. It probably 
> > > > > > > would be an improvement but this patch is not trying to modify 
> > > > > > > how all diagnostics are printed. (all of that logic is in 
> > > > > > > Diagnostic.cpp)
> > > > > > This patch is pushing the envelope of what appears in diagnostics. 
> > > > > > One can also argue that someone writing
> > > > > > ```
> > > > > > static_assert(false, "\u0301");
> > > > > > ```
> > > > > > gets what they deserve, but that case does not have a big problem 
> > > > > > anyway (because the provided message text appears after `: `).
> > > > > > 
> > > > > > This patch increases the exposure of the diagnostic output facility 
> > > > > > to input that it does not handle well. I disagree that it is 
> > > > > > outside the scope of this patch to insist that it does not generate 
> > > > > > such inputs to the diagnostic output facility (even if a possible 
> > > > > > solution is to modify the diagnostic output facility first).
> > > > > @cor3ntin, do you have status quo examples for how grapheme-extending 
> > > > > characters that are not already "problematic" in their original 
> > > > > context are emitted in diagnostics in contexts where they are?
> > > > Are you looking for that sort of examples? 
> > > > https://godbolt.org/z/c79xWr7Me
> > > > That shows that clang has no understanding of graphemes
> > > gcc and MSVC get that case "right" (probably by accident). 
> > > https://godbolt.org/z/Tjd6xnEon
> > I was more looking for cases where the output grapheme includes elements 
> > that were part of the fixed message text (gobbling quotes, etc.). Also, 
> > this patch is in the wrong shape for handling this concern in the 
> > diagnostic printer because the delimiting of the replacement text happens 
> > in this patch.
> Could you craft a message that becomes a graphene after substitution by the 
> engine? Maybe? You would have to try very hard and `static_assert` 
> diagnostics are not of the right shape.
> 
> > This patch increases the exposure of the diagnostic output facility to 
> > input that it does not handle well. I disagree that it is outside the scope 
> > of this patch to insist that it does not generate such inputs to the 
> > diagnostic output facility (even if a possible solution is to modify the 
> > diagnostic output facility first).
> 
> I still don't see it. None of the output produce in this patch are even close 
> to what i could be problematic. ie this patch is only ever producing ASCII or 
> single codepoints that gets escaped when they are not printable
> Could you craft a message that becomes a graphene after substitution by the 
> engine? Maybe? You would have to try very hard and `static_assert` 
> diagnostics are not of the right shape.

That is what I meant by this patch introducing new situations.

> > This patch increases the exposure of the diagnostic output facility to 
> > input that it does not handle well. I disagree that it is outside the scope 
> > of this patch to insist that it does not generate such inputs to the 
> > diagnostic output facility (even if a possible solution is to modify the 
> > diagnostic output facility first).
> 
> I still don't see it. None of the output produce in this patch are even close 
> to what i could be problematic. ie this patch is only ever producing ASCII or 
> single codepoints that gets escaped when they are not printable

This patch produces single codepoints that are not escaped even when they may 
combine with a `'` delimiter. This patch also (currently) forms the string with 
the `'` and the potentially-combining character directly adjacent to each 
other. The least this patch should do is to emit the potentially-combining 
character to the diagnostic facility as a separate substitution (that is, if we 
can agree that the diagnostic facility should consider substitution boundaries 
as separate text elements; i.e., no graphemes should be formed partially by a 
substitution).

Whether the diagnostic facility should use an escape sequence or a ZWNJ at the 
boundary can be a different discussion.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155610/new/

https://reviews.llvm.org/D155610

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to