On Tue, Nov 02, 2021 at 12:56:53PM +0100, Jakub Jelinek wrote: > Consider attached testcases Whomoglyph1.C and Whomoglyph2.C. > On Whomoglyph1.C testcase, I'd expect a warning, because there is a clear > confusion for the reader, something that isn't visible in any of emacs, vim, > joe editors or on the terminal, when f3 uses scope identifier, the casual > reader will expect that it uses N1::N2::scope, but there is no such > variable, only one N1::N2::ѕсоре that visually looks the same, but has > different UTF-8 chars in it. So, name lookup will instead find N1::scope > and use that. > But Whomoglyph2.C will emit warnings that are IMHO not appropriate, > I believe there is no confusion at all there, e.g. for both C and C++, > the f5/f6 case, it doesn't really matter how each of the function names its > own parameter, one can never access another function's parameter. > Ditto for different namespace provided that both namespaces aren't searched > in the same name lookup, or similarly classes etc. > So, IMNSHO that warning belongs to name-lookup (cp/name-lookup.c for the C++ > FE). > And, another important thing is that most users don't really use unicode in > identifiers, I bet over 99.9% of identifiers don't have any >= 0x80 > characters in it and even when people do use them, confusable identifiers > during the same lookup are even far more unlikely. > So, I think we should optimize for the common case, ASCII only identifiers > and spend as little compile time as possible on this stuff.
If we keep doing it in the stringpool, then e.g. one couldn't #include <zlib.h> in a program with Russian/Ukrainian/Serbian etc. identifiers where some parameter or automatic variable etc. in some function in that file is called с (Cyrillic letter es), etc. just because in zlib.h one of the arguments in one of the function prototypes is called c (latin small letter c). I'd be afraid most of the users that actually want to use UTF-8 or UCNs in their identifiers would then just need to disable this warning... Jakub