barannikov88 added a comment.

In D154290#4483055 <https://reviews.llvm.org/D154290#4483055>, @cor3ntin wrote:

> In D154290#4482975 <https://reviews.llvm.org/D154290#4482975>, @barannikov88 
> wrote:
>
>> According to the current wording, the static_assert-message is either 
>> unevaluated string or an expression evaluated at compile time.
>> Unevaluated strings don't allow certain escape sequences, but if I wrap the 
>> string in a string_view-like class, I'm allowed to use any escape sequeces, 
>> including '\x'.
>> Moreover, wrapping a string in a class would change its encoding. 
>> Unevaluated strings are displayed as written in the source (that is, UTF-8), 
>> while wrapped strings undergo conversion to execution encoding (e.g. EBCDIC) 
>> and then printed in system locale, leading to mojibake.
>
> Not quite.
> Unevaluated strings are always UTF-8 ( regardless of source file encoding). 
> Evaluated strings are in the literal encoding which is always UTF-8 for 
> clang. 
> This will change whenever we allow for different kinds of literal encodings 
> per  this RFC 
> https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512/1
>
> If and when that is the case we will have to convert back to UTF-8 before 
> displaying - and then maybe convert back to the system locale depending on 
> host.
> Numeric escape sequences can then occur in evaluated strings and produce 
> mojibake if the evaluated strings is not valid in the string literal encoding.
> I don't believe that we would want to output static messages without 
> conversion on any system as the diagnostics framework is very much geared 
> towards UTF-8 and we want to keep supporting cross compilation.
>
> So the process will be
> source -> utf8 -> literal encoding -> utf8 -> terminal encoding.

Thanks for your reply, I think I see the idea.

> By the same account, casting 0-extended utf-8 to char is fine until such time 
> clang support more than UTF-8. (which is one of the reasons we need to make 
> sure clang conversions utilities can convert from and to utf-8)
>
> Unevaluated strings were introduced in part to help identify what gets 
> converted and what does not.

It is a bit strange that the string in `static_assert(false, "й")` is not 
converted, while it is converted in `static_assert(false, 
std::string_view("й"))`.
It might be possible to achieve identical diagnostic output even with 
-fexec-charset supported (which would only affect the second form),
but right now I'm confused by the distinction… Why don't always evaluate the 
message?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154290/new/

https://reviews.llvm.org/D154290

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to