This went in in r174765-9 in Clang. Are there LLVM-side failures too, or is this good enough?
Jordan On Feb 5, 2013, at 9:26 , Jordan Rose <[email protected]> wrote: > [+Joerg, +Dmitri, -Takumi] > > We actually had a discussion about this on IRC. The place where this > information currently lives is Clang's Lexer.cpp, which is not at all ideal > for reuse. However, some systems actually link Clang against an LLVM dylib > (Debian is the only one that comes to mind), and Lexer's use of the CharInfo > table is very performance-critical. We decided that for that reason it > wouldn't be safe to move the table out of Clang. > > Since you hadn't responded yet, I began pulling it out to a Clang support > file in Basic: http://llvm-reviews.chandlerc.com/D363. I don't think this > will have disastrous performance implications, and it makes those available > to the rest of Clang...but not LLVM. I'm not sure what we're going to do > about that, but as long as we keep Unicode out of IR for now we'll be > fine...right? > > FWIW, isdigit and isxdigit are safe to use because they are guaranteed to be > locale-independent. > > Jordan > > > On Feb 5, 2013, at 3:46 , "Benyei, Guy" <[email protected]> wrote: > >> I can take care of this change. >> Do we already have LLVM ASCII-only versions for the character classification >> functions somewhere, or should I create them? I tried to look for these >> functions in LLVM support, but couldn't find them. >> >> Thanks >> Guy >> >> -----Original Message----- >> From: Jordan Rose [mailto:[email protected]] >> Sent: Wednesday, January 30, 2013 18:36 >> To: Benyei, Guy >> Cc: NAKAMURA Takumi; [email protected] >> Subject: Re: [cfe-commits] r173622 - PR15067: Don't assert when a UCN >> appears in a C90 file. >> >> Ah...good point. Richard pointed out to me that we probably shouldn't be >> using the library functions anyway, since sometimes they're locale-sensitive >> and sometimes they aren't. We can replace them all with LLVM ASCII-only >> versions that take a char instead of int. >> >> Do you want to seek out all the uses or should I? >> >> >> On Jan 29, 2013, at 0:27 , "Benyei, Guy" <[email protected]> wrote: >> >>> Right, but you did change the failing tests, and added Unicode characters >>> (FixIt/fixit-unicode.c, CodeGen\ucn-identifiers.c). >>> Anyhow, the failures occur in debug build only: >>> >>> CRT assert: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c(56) : >>> Assertion failed: (unsigned)(c + 1) <= 256 >>> >>> Apparently this assertion is supposed to check that the value passed to the >>> character classification functions is an actual character. However, since >>> in clang and LLVM there are several places where a signed char is passed to >>> these function - the negative char values are simply passed as negative int >>> values, and the c-style cast above results in some really big numbers, that >>> fail the assertion. >>> >>> Thanks >>> Guy >>> >>> >>> -----Original Message----- >>> From: Jordan Rose [mailto:[email protected]] >>> Sent: Monday, January 28, 2013 19:03 >>> To: Benyei, Guy; NAKAMURA Takumi >>> Cc: [email protected] >>> Subject: Re: [cfe-commits] r173622 - PR15067: Don't assert when a UCN >>> appears in a C90 file. >>> >>> Hm, I didn't change any of these, and they've been working fine on Takumi's >>> bot for a while now. Takumi, do you have an opinion on these fixes? >>> >>> Jordan >>> >>> >>> On Jan 28, 2013, at 5:45 , "Benyei, Guy" <[email protected]> wrote: >>> >>>> Hi Jordan, >>>> Working on Windows/Visual Studio 2010, I get asserts due to the usage of >>>> signed chars representing Unicode characters in some library functions >>>> expecting int arguments (specifically: isdigit and isalnum). Ultimately, >>>> this makes Visual Studio crash while running check-clang. >>>> A possible solution would be to cast these char values to unsigned char: >>>> >>>> Index: LiteralSupport.cpp >>>> =================================================================== >>>> --- LiteralSupport.cpp (revision 173676) >>>> +++ LiteralSupport.cpp (working copy) >>>> @@ -458,7 +458,7 @@ >>>> // and FP constants (specifically, the 'pp-number' regex), and >>>> assumes that // the byte at "*end" is both valid and not part of the >>>> regex. Because of // this, it doesn't have to check for 'overscan' in >>>> various places. >>>> - assert(!isalnum(*ThisTokEnd) && *ThisTokEnd != '.' && *ThisTokEnd >>>> != '_' && >>>> + assert(!isalnum((unsigned char)*ThisTokEnd) && *ThisTokEnd != '.' >>>> + && *ThisTokEnd != '_' && >>>> "Lexer didn't maximally munch?"); >>>> >>>> s = DigitsBegin = ThisTokBegin; >>>> >>>> >>>> Index: AsmWriter.cpp >>>> =================================================================== >>>> --- AsmWriter.cpp (revision 173676) >>>> +++ AsmWriter.cpp (working copy) >>>> @@ -117,7 +117,7 @@ >>>> } >>>> >>>> // Scan the name to see if it needs quotes first. >>>> - bool NeedsQuotes = isdigit(Name[0]); >>>> + bool NeedsQuotes = isdigit((unsigned char)Name[0]); >>>> if (!NeedsQuotes) { >>>> for (unsigned i = 0, e = Name.size(); i != e; ++i) { >>>> // By making this unsigned, the value passed in to isalnum will >>>> always be >>>> >>>> Anyhow, these are only the specific places the current LITs fail on - I'm >>>> really not sure if there are others. >>>> >>>> Thanks >>>> Guy >>>> >>>> -----Original Message----- >>>> From: [email protected] >>>> [mailto:[email protected]] On Behalf Of Jordan Rose >>>> Sent: Sunday, January 27, 2013 22:12 >>>> To: [email protected] >>>> Subject: [cfe-commits] r173622 - PR15067: Don't assert when a UCN appears >>>> in a C90 file. >>>> >>>> Author: jrose >>>> Date: Sun Jan 27 14:12:04 2013 >>>> New Revision: 173622 >>>> >>>> URL: http://llvm.org/viewvc/llvm-project?rev=173622&view=rev >>>> Log: >>>> PR15067: Don't assert when a UCN appears in a C90 file. >>>> >>>> Unfortunately, we can't accept the UCN as an extension because we're >>>> required to treat it as two tokens for preprocessing purposes. >>>> >>>> Modified: >>>> cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td >>>> cfe/trunk/lib/Lex/Lexer.cpp >>>> cfe/trunk/lib/Lex/LiteralSupport.cpp >>>> cfe/trunk/test/Lexer/c90.c >>>> >>>> Modified: cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Dia >>>> g nosticLexKinds.td?rev=173622&r1=173621&r2=173622&view=diff >>>> ===================================================================== >>>> = >>>> ======== >>>> --- cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td (original) >>>> +++ cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td Sun Jan 27 >>>> +++ 14:12:04 2013 >>>> @@ -127,6 +127,11 @@ def warn_cxx98_compat_literal_ucn_escape >>>> def warn_cxx98_compat_literal_ucn_control_character : Warning< >>>> "universal character name referring to a control character " >>>> "is incompatible with C++98">, InGroup<CXX98Compat>, DefaultIgnore; >>>> +def warn_ucn_not_valid_in_c89 : Warning< >>>> + "universal character names are only valid in C99 or C++; " >>>> + "treating as '\\' followed by identifier">, InGroup<Unicode>; def >>>> +warn_ucn_not_valid_in_c89_literal : ExtWarn< >>>> + "universal character names are only valid in C99 or C++">, >>>> +InGroup<Unicode>; >>>> >>>> >>>> // Literal >>>> @@ -165,8 +170,6 @@ def ext_string_too_long : Extension<"str >>>> "support">, InGroup<OverlengthStrings>; def err_character_too_large >>>> : Error< "character too large for enclosing character literal >>>> type">; -def >>>> warn_ucn_not_valid_in_c89 : ExtWarn< >>>> - "unicode escape sequences are only valid in C99 or C++">, >>>> InGroup<Unicode>; def warn_cxx98_compat_unicode_literal : Warning< >>>> "unicode literals are incompatible with C++98">, >>>> InGroup<CXX98Compat>, DefaultIgnore; >>>> >>>> Modified: cfe/trunk/lib/Lex/Lexer.cpp >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/Lexer.cpp?rev=1 >>>> 7 >>>> 3622&r1=173621&r2=173622&view=diff >>>> ===================================================================== >>>> = >>>> ======== >>>> --- cfe/trunk/lib/Lex/Lexer.cpp (original) >>>> +++ cfe/trunk/lib/Lex/Lexer.cpp Sun Jan 27 14:12:04 2013 >>>> @@ -2698,8 +2698,6 @@ bool Lexer::isCodeCompletionPoint(const >>>> >>>> uint32_t Lexer::tryReadUCN(const char *&StartPtr, const char *SlashLoc, >>>> Token *Result) { >>>> - assert(LangOpts.CPlusPlus || LangOpts.C99); >>>> - >>>> unsigned CharSize; >>>> char Kind = getCharAndSize(StartPtr, CharSize); >>>> >>>> @@ -2711,6 +2709,11 @@ uint32_t Lexer::tryReadUCN(const char *& else >>>> return 0; >>>> >>>> + if (!LangOpts.CPlusPlus && !LangOpts.C99) { >>>> + Diag(SlashLoc, diag::warn_ucn_not_valid_in_c89); >>>> + return 0; >>>> + } >>>> + >>>> const char *CurPtr = StartPtr + CharSize; const char *KindLoc = >>>> &CurPtr[-1]; >>>> >>>> @@ -2737,7 +2740,7 @@ uint32_t Lexer::tryReadUCN(const char *& >>>> } >>>> } >>>> } >>>> - >>>> + >>>> return 0; >>>> } >>>> >>>> >>>> Modified: cfe/trunk/lib/Lex/LiteralSupport.cpp >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/LiteralSupport. >>>> c pp?rev=173622&r1=173621&r2=173622&view=diff >>>> ===================================================================== >>>> = >>>> ======== >>>> --- cfe/trunk/lib/Lex/LiteralSupport.cpp (original) >>>> +++ cfe/trunk/lib/Lex/LiteralSupport.cpp Sun Jan 27 14:12:04 2013 >>>> @@ -277,7 +277,7 @@ static bool ProcessUCNEscape(const char >>>> >>>> if (!Features.CPlusPlus && !Features.C99 && Diags) >>>> Diag(Diags, Features, Loc, ThisTokBegin, UcnBegin, ThisTokBuf, >>>> - diag::warn_ucn_not_valid_in_c89); >>>> + diag::warn_ucn_not_valid_in_c89_literal); >>>> >>>> return true; >>>> } >>>> >>>> Modified: cfe/trunk/test/Lexer/c90.c >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Lexer/c90.c?rev=17 >>>> 3 >>>> 622&r1=173621&r2=173622&view=diff >>>> ===================================================================== >>>> = >>>> ======== >>>> --- cfe/trunk/test/Lexer/c90.c (original) >>>> +++ cfe/trunk/test/Lexer/c90.c Sun Jan 27 14:12:04 2013 >>>> @@ -29,8 +29,8 @@ void test2() { >>>> } >>>> >>>> void test3() { >>>> - (void)L"\u1234"; // expected-error {{unicode escape sequences are >>>> only valid in C99 or C++}} >>>> - (void)L'\u1234'; // expected-error {{unicode escape sequences are >>>> only valid in C99 or C++}} >>>> + (void)L"\u1234"; // expected-error {{universal character names >>>> + are only valid in C99 or C++}} (void)L'\u1234'; // expected-error >>>> + {{universal character names are only valid in C99 or C++}} >>>> } >>>> >>>> #define PREFIX(x) foo ## x >>>> @@ -39,3 +39,5 @@ int test4() { >>>> int *p = &PREFIX(0p+1); >>>> return p[-1]; >>>> } >>>> + >>>> +#define MY_UCN \u00FC // expected-warning {{universal character >>>> +names are only valid in C99 or C++; treating as '\' followed by >>>> +identifier}} >>>> >>>> >>>> _______________________________________________ >>>> cfe-commits mailing list >>>> [email protected] >>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>>> --------------------------------------------------------------------- >>>> Intel Israel (74) Limited >>>> >>>> This e-mail and any attachments may contain confidential material for >>>> the sole use of the intended recipient(s). Any review or distribution >>>> by others is strictly prohibited. If you are not the intended >>>> recipient, please contact the sender and delete all copies. >>>> >>> >>> --------------------------------------------------------------------- >>> Intel Israel (74) Limited >>> >>> This e-mail and any attachments may contain confidential material for >>> the sole use of the intended recipient(s). Any review or distribution >>> by others is strictly prohibited. If you are not the intended >>> recipient, please contact the sender and delete all copies. >>> >> >> --------------------------------------------------------------------- >> Intel Israel (74) Limited >> >> This e-mail and any attachments may contain confidential material for >> the sole use of the intended recipient(s). Any review or distribution >> by others is strictly prohibited. If you are not the intended >> recipient, please contact the sender and delete all copies. >> > _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
