cor3ntin added inline comments.

================
Comment at: clang/test/SemaCXX/static-assert.cpp:287
+  static_assert((char16_t)L'ゆ' == L"C̵̭̯̠̎͌ͅť̺"[1], ""); // expected-error 
{{failed}} \
+                                                  // expected-note {{evaluates 
to ''ゆ' (0x3086) == '̵' (0x335)'}}
+  static_assert(L"\/"[1] == u'\xFFFD', ""); // expected-error {{failed}} \
----------------
hubert.reinterpretcast wrote:
> cor3ntin wrote:
> > hubert.reinterpretcast wrote:
> > > tahonermann wrote:
> > > > cor3ntin wrote:
> > > > > hubert.reinterpretcast wrote:
> > > > > > hubert.reinterpretcast wrote:
> > > > > > > cor3ntin wrote:
> > > > > > > > hubert.reinterpretcast wrote:
> > > > > > > > > The C++23 escaped string formatting facility would not 
> > > > > > > > > generate a trailing combining character like this. I 
> > > > > > > > > recommend following suit.
> > > > > > > > > 
> > > > > > > > > Info on U+0335: 
> > > > > > > > > https://util.unicode.org/UnicodeJsps/character.jsp?a=0335
> > > > > > > > > 
> > > > > > > > This is way outside the scope of the patch. The diagnostic 
> > > > > > > > output facility has no understanding of combining characters or 
> > > > > > > > graphemes and do not attempt to match std::print. It probably 
> > > > > > > > would be an improvement but this patch is not trying to modify 
> > > > > > > > how all diagnostics are printed. (all of that logic is in 
> > > > > > > > Diagnostic.cpp)
> > > > > > > This patch is pushing the envelope of what appears in 
> > > > > > > diagnostics. One can also argue that someone writing
> > > > > > > ```
> > > > > > > static_assert(false, "\u0301");
> > > > > > > ```
> > > > > > > gets what they deserve, but that case does not have a big problem 
> > > > > > > anyway (because the provided message text appears after `: `).
> > > > > > > 
> > > > > > > This patch increases the exposure of the diagnostic output 
> > > > > > > facility to input that it does not handle well. I disagree that 
> > > > > > > it is outside the scope of this patch to insist that it does not 
> > > > > > > generate such inputs to the diagnostic output facility (even if a 
> > > > > > > possible solution is to modify the diagnostic output facility 
> > > > > > > first).
> > > > > > @cor3ntin, do you have status quo examples for how 
> > > > > > grapheme-extending characters that are not already "problematic" in 
> > > > > > their original context are emitted in diagnostics in contexts where 
> > > > > > they are?
> > > > > Are you looking for that sort of examples? 
> > > > > https://godbolt.org/z/c79xWr7Me
> > > > > That shows that clang has no understanding of graphemes
> > > > gcc and MSVC get that case "right" (probably by accident). 
> > > > https://godbolt.org/z/Tjd6xnEon
> > > I was more looking for cases where the output grapheme includes elements 
> > > that were part of the fixed message text (gobbling quotes, etc.). Also, 
> > > this patch is in the wrong shape for handling this concern in the 
> > > diagnostic printer because the delimiting of the replacement text happens 
> > > in this patch.
> > Could you craft a message that becomes a graphene after substitution by the 
> > engine? Maybe? You would have to try very hard and `static_assert` 
> > diagnostics are not of the right shape.
> > 
> > > This patch increases the exposure of the diagnostic output facility to 
> > > input that it does not handle well. I disagree that it is outside the 
> > > scope of this patch to insist that it does not generate such inputs to 
> > > the diagnostic output facility (even if a possible solution is to modify 
> > > the diagnostic output facility first).
> > 
> > I still don't see it. None of the output produce in this patch are even 
> > close to what i could be problematic. ie this patch is only ever producing 
> > ASCII or single codepoints that gets escaped when they are not printable
> > Could you craft a message that becomes a graphene after substitution by the 
> > engine? Maybe? You would have to try very hard and `static_assert` 
> > diagnostics are not of the right shape.
> 
> That is what I meant by this patch introducing new situations.
> 
> > > This patch increases the exposure of the diagnostic output facility to 
> > > input that it does not handle well. I disagree that it is outside the 
> > > scope of this patch to insist that it does not generate such inputs to 
> > > the diagnostic output facility (even if a possible solution is to modify 
> > > the diagnostic output facility first).
> > 
> > I still don't see it. None of the output produce in this patch are even 
> > close to what i could be problematic. ie this patch is only ever producing 
> > ASCII or single codepoints that gets escaped when they are not printable
> 
> This patch produces single codepoints that are not escaped even when they may 
> combine with a `'` delimiter. This patch also (currently) forms the string 
> with the `'` and the potentially-combining character directly adjacent to 
> each other. The least this patch should do is to emit the 
> potentially-combining character to the diagnostic facility as a separate 
> substitution (that is, if we can agree that the diagnostic facility should 
> consider substitution boundaries as separate text elements; i.e., no 
> graphemes should be formed partially by a substitution).
> 
> Whether the diagnostic facility should use an escape sequence or a ZWNJ at 
> the boundary can be a different discussion.
Sure we could make it a separate message, although we added so much redundant 
information in the crafterdmessage it might be tricky.

But now that I understand your concern it's the fact that if the codepoints is 
a grapheme extend ( so we are printing a char16_t or something with at least as 
many bytes), whether or not it would be rendered as a glyph or be escaped, if 
clang behavior were to escape a printable lone combining character (which it is 
currently not) would depend on whether it is passed directly or not to the diag 
engine. 

Sure. At least now I get what you mean.
I still don't see that has a reason to rework this patch yet again, there are 
too many ifs for it to be something users are likely to encounter, and it 
requires clang features that are just not there and that we do not have plans 
to implementation beyond "wouldn't it be nice if"

Would you be happy with a fixme in the code?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D155610/new/

https://reviews.llvm.org/D155610

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to