hazohelet added a comment. In D155610#4550346 <https://reviews.llvm.org/D155610#4550346>, @cor3ntin wrote:
> I've been thinking about it and I think I have a cleaner design for the > printing of characters: > > We need a `CharToString(unsigned, Qualtype) -> SmallString` method that takes > a value and the type. > for `char` and `char8_t` we can just return the value. > For `wchar_t`, `char32_t` and `char16_t`, we can use something like > `ConvertCodePointToUTF8` to convert to UTF-8. If that fails we can escape > with `\x` > If we pass the result of that to the diagnostic engine, escaping of non > printing character would happen automatically. > > That way we have a nice separation between converting an APValue to string > and printing it, which should avoid code duplication quite a bit, and > generally make the design nicer. > Then maybe we need to consider whether we want to modify > `CharacterLiteral::print` to be aligned with all of that. I don;t know if > that's used, for example, for mangling. > > Given there are a bunch of different issues here, i would not mind separate > PRs - having the numerical value showed in paren seems valuable on its own. Thanks for the suggestion. Printing of multibyte character is a bit out of the scope of the original goal of this patch, but I'll give it a try. ================ Comment at: clang/lib/Basic/Diagnostic.cpp:838-858 + if (UseUCN) + OutStream << "\\u" + << llvm::format_hex_no_prefix(CodepointValue, /*Width=*/4, + /*Upper=*/false); + else + OutStream << "<U+" + << llvm::format_hex_no_prefix(CodepointValue, /*Width=*/4, ---------------- cor3ntin wrote: > hazohelet wrote: > > cor3ntin wrote: > > > The use UCN addition is probably not justified. we should consistent in > > > how we print the value of non-printable code points. > > My motivation here is to print a valid character literal. I think it > > justifies this change somewhat. I'd like to see what others think about > > this. > Why? there is no expectation that diagnostics messages reproduce C++. > Consistency between string literals and characters literals is a lot more > important The diagnostic meesage here looks like `expression evaluates to 'VALUE1 == VALUE2'` I tend to expect that `VALUE1 == VALUE2` is a syntactically valid expression because of the syntactical element `==`. But if others do not feel the same way, I am okay with something like `'<U+0001>' == '<U+0002>'` CHANGES SINCE LAST ACTION https://reviews.llvm.org/D155610/new/ https://reviews.llvm.org/D155610 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits