This went in in r174765-9 in Clang. Are there LLVM-side failures too, or is 
this good enough?

Jordan


On Feb 5, 2013, at 9:26 , Jordan Rose <[email protected]> wrote:

> [+Joerg, +Dmitri, -Takumi]
> 
> We actually had a discussion about this on IRC. The place where this 
> information currently lives is Clang's Lexer.cpp, which is not at all ideal 
> for reuse. However, some systems actually link Clang against an LLVM dylib 
> (Debian is the only one that comes to mind), and Lexer's use of the CharInfo 
> table is very performance-critical. We decided that for that reason it 
> wouldn't be safe to move the table out of Clang.
> 
> Since you hadn't responded yet, I began pulling it out to a Clang support 
> file in Basic: http://llvm-reviews.chandlerc.com/D363. I don't think this 
> will have disastrous performance implications, and it makes those available 
> to the rest of Clang...but not LLVM. I'm not sure what we're going to do 
> about that, but as long as we keep Unicode out of IR for now we'll be 
> fine...right?
> 
> FWIW, isdigit and isxdigit are safe to use because they are guaranteed to be 
> locale-independent.
> 
> Jordan
> 
> 
> On Feb 5, 2013, at 3:46 , "Benyei, Guy" <[email protected]> wrote:
> 
>> I can take care of this change.
>> Do we already have LLVM ASCII-only versions for the character classification 
>> functions somewhere, or should I create them? I tried to look for these 
>> functions in LLVM support, but couldn't find them.
>> 
>> Thanks
>>  Guy
>> 
>> -----Original Message-----
>> From: Jordan Rose [mailto:[email protected]] 
>> Sent: Wednesday, January 30, 2013 18:36
>> To: Benyei, Guy
>> Cc: NAKAMURA Takumi; [email protected]
>> Subject: Re: [cfe-commits] r173622 - PR15067: Don't assert when a UCN 
>> appears in a C90 file.
>> 
>> Ah...good point. Richard pointed out to me that we probably shouldn't be 
>> using the library functions anyway, since sometimes they're locale-sensitive 
>> and sometimes they aren't. We can replace them all with LLVM ASCII-only 
>> versions that take a char instead of int.
>> 
>> Do you want to seek out all the uses or should I?
>> 
>> 
>> On Jan 29, 2013, at 0:27 , "Benyei, Guy" <[email protected]> wrote:
>> 
>>> Right, but you did change the failing tests, and added Unicode characters 
>>> (FixIt/fixit-unicode.c, CodeGen\ucn-identifiers.c). 
>>> Anyhow, the failures occur in debug build only:
>>> 
>>> CRT assert: f:\dd\vctools\crt_bld\self_x86\crt\src\isctype.c(56) : 
>>> Assertion failed: (unsigned)(c + 1) <= 256
>>> 
>>> Apparently this assertion is supposed to check that the value passed to the 
>>> character classification functions is an actual character. However, since 
>>> in clang and LLVM there are several places where a signed char is passed to 
>>> these function - the negative char values are simply passed as negative int 
>>> values, and the c-style cast above results in some really big numbers, that 
>>> fail the assertion.
>>> 
>>> Thanks
>>> Guy
>>> 
>>> 
>>> -----Original Message-----
>>> From: Jordan Rose [mailto:[email protected]]
>>> Sent: Monday, January 28, 2013 19:03
>>> To: Benyei, Guy; NAKAMURA Takumi
>>> Cc: [email protected]
>>> Subject: Re: [cfe-commits] r173622 - PR15067: Don't assert when a UCN 
>>> appears in a C90 file.
>>> 
>>> Hm, I didn't change any of these, and they've been working fine on Takumi's 
>>> bot for a while now. Takumi, do you have an opinion on these fixes?
>>> 
>>> Jordan
>>> 
>>> 
>>> On Jan 28, 2013, at 5:45 , "Benyei, Guy" <[email protected]> wrote:
>>> 
>>>> Hi Jordan,
>>>> Working on Windows/Visual Studio 2010, I get asserts due to the usage of 
>>>> signed chars representing Unicode characters in some library functions 
>>>> expecting int arguments (specifically: isdigit and isalnum). Ultimately, 
>>>> this makes Visual Studio crash while running check-clang.
>>>> A possible solution would be to cast these char values to unsigned char:
>>>> 
>>>> Index: LiteralSupport.cpp
>>>> ===================================================================
>>>> --- LiteralSupport.cpp     (revision 173676)
>>>> +++ LiteralSupport.cpp     (working copy)
>>>> @@ -458,7 +458,7 @@
>>>> // and FP constants (specifically, the 'pp-number' regex), and 
>>>> assumes that  // the byte at "*end" is both valid and not part of the 
>>>> regex.  Because of  // this, it doesn't have to check for 'overscan' in 
>>>> various places.
>>>> -  assert(!isalnum(*ThisTokEnd) && *ThisTokEnd != '.' && *ThisTokEnd 
>>>> != '_' &&
>>>> +  assert(!isalnum((unsigned char)*ThisTokEnd) && *ThisTokEnd != '.' 
>>>> + && *ThisTokEnd != '_' &&
>>>>       "Lexer didn't maximally munch?");
>>>> 
>>>> s = DigitsBegin = ThisTokBegin;
>>>> 
>>>> 
>>>> Index: AsmWriter.cpp
>>>> ===================================================================
>>>> --- AsmWriter.cpp  (revision 173676)
>>>> +++ AsmWriter.cpp  (working copy)
>>>> @@ -117,7 +117,7 @@
>>>> }
>>>> 
>>>> // Scan the name to see if it needs quotes first.
>>>> -  bool NeedsQuotes = isdigit(Name[0]);
>>>> +  bool NeedsQuotes = isdigit((unsigned char)Name[0]);
>>>> if (!NeedsQuotes) {
>>>>  for (unsigned i = 0, e = Name.size(); i != e; ++i) {
>>>>    // By making this unsigned, the value passed in to isalnum will 
>>>> always be
>>>> 
>>>> Anyhow, these are only the specific places the current LITs fail on - I'm 
>>>> really not sure if there are others.
>>>> 
>>>> Thanks
>>>> Guy
>>>> 
>>>> -----Original Message-----
>>>> From: [email protected] 
>>>> [mailto:[email protected]] On Behalf Of Jordan Rose
>>>> Sent: Sunday, January 27, 2013 22:12
>>>> To: [email protected]
>>>> Subject: [cfe-commits] r173622 - PR15067: Don't assert when a UCN appears 
>>>> in a C90 file.
>>>> 
>>>> Author: jrose
>>>> Date: Sun Jan 27 14:12:04 2013
>>>> New Revision: 173622
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=173622&view=rev
>>>> Log:
>>>> PR15067: Don't assert when a UCN appears in a C90 file.
>>>> 
>>>> Unfortunately, we can't accept the UCN as an extension because we're 
>>>> required to treat it as two tokens for preprocessing purposes.
>>>> 
>>>> Modified:
>>>> cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td
>>>> cfe/trunk/lib/Lex/Lexer.cpp
>>>> cfe/trunk/lib/Lex/LiteralSupport.cpp
>>>> cfe/trunk/test/Lexer/c90.c
>>>> 
>>>> Modified: cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/Dia
>>>> g nosticLexKinds.td?rev=173622&r1=173621&r2=173622&view=diff
>>>> =====================================================================
>>>> =
>>>> ========
>>>> --- cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td (original)
>>>> +++ cfe/trunk/include/clang/Basic/DiagnosticLexKinds.td Sun Jan 27
>>>> +++ 14:12:04 2013
>>>> @@ -127,6 +127,11 @@ def warn_cxx98_compat_literal_ucn_escape
>>>> def warn_cxx98_compat_literal_ucn_control_character : Warning<  
>>>> "universal character name referring to a control character "
>>>> "is incompatible with C++98">, InGroup<CXX98Compat>, DefaultIgnore;
>>>> +def warn_ucn_not_valid_in_c89 : Warning<
>>>> +  "universal character names are only valid in C99 or C++; "
>>>> +  "treating as '\\' followed by identifier">, InGroup<Unicode>; def 
>>>> +warn_ucn_not_valid_in_c89_literal : ExtWarn<
>>>> +  "universal character names are only valid in C99 or C++">, 
>>>> +InGroup<Unicode>;
>>>> 
>>>> 
>>>> // Literal
>>>> @@ -165,8 +170,6 @@ def ext_string_too_long : Extension<"str  
>>>> "support">, InGroup<OverlengthStrings>;  def err_character_too_large 
>>>> : Error<  "character too large for enclosing character literal 
>>>> type">; -def
>>>> warn_ucn_not_valid_in_c89 : ExtWarn<
>>>> -  "unicode escape sequences are only valid in C99 or C++">, 
>>>> InGroup<Unicode>;  def warn_cxx98_compat_unicode_literal : Warning<  
>>>> "unicode literals are incompatible with C++98">,  
>>>> InGroup<CXX98Compat>, DefaultIgnore;
>>>> 
>>>> Modified: cfe/trunk/lib/Lex/Lexer.cpp
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/Lexer.cpp?rev=1
>>>> 7
>>>> 3622&r1=173621&r2=173622&view=diff
>>>> =====================================================================
>>>> =
>>>> ========
>>>> --- cfe/trunk/lib/Lex/Lexer.cpp (original)
>>>> +++ cfe/trunk/lib/Lex/Lexer.cpp Sun Jan 27 14:12:04 2013
>>>> @@ -2698,8 +2698,6 @@ bool Lexer::isCodeCompletionPoint(const
>>>> 
>>>> uint32_t Lexer::tryReadUCN(const char *&StartPtr, const char *SlashLoc,
>>>>                         Token *Result) {
>>>> -  assert(LangOpts.CPlusPlus || LangOpts.C99);
>>>> -
>>>> unsigned CharSize;
>>>> char Kind = getCharAndSize(StartPtr, CharSize);
>>>> 
>>>> @@ -2711,6 +2709,11 @@ uint32_t Lexer::tryReadUCN(const char *&  else
>>>>  return 0;
>>>> 
>>>> +  if (!LangOpts.CPlusPlus && !LangOpts.C99) {
>>>> +    Diag(SlashLoc, diag::warn_ucn_not_valid_in_c89);
>>>> +    return 0;
>>>> +  }
>>>> +
>>>> const char *CurPtr = StartPtr + CharSize;  const char *KindLoc = 
>>>> &CurPtr[-1];
>>>> 
>>>> @@ -2737,7 +2740,7 @@ uint32_t Lexer::tryReadUCN(const char *&
>>>>        }
>>>>      }
>>>>    }
>>>> -      
>>>> +
>>>>    return 0;
>>>>  }
>>>> 
>>>> 
>>>> Modified: cfe/trunk/lib/Lex/LiteralSupport.cpp
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Lex/LiteralSupport.
>>>> c pp?rev=173622&r1=173621&r2=173622&view=diff
>>>> =====================================================================
>>>> =
>>>> ========
>>>> --- cfe/trunk/lib/Lex/LiteralSupport.cpp (original)
>>>> +++ cfe/trunk/lib/Lex/LiteralSupport.cpp Sun Jan 27 14:12:04 2013
>>>> @@ -277,7 +277,7 @@ static bool ProcessUCNEscape(const char
>>>> 
>>>> if (!Features.CPlusPlus && !Features.C99 && Diags)
>>>>  Diag(Diags, Features, Loc, ThisTokBegin, UcnBegin, ThisTokBuf,
>>>> -         diag::warn_ucn_not_valid_in_c89);
>>>> +         diag::warn_ucn_not_valid_in_c89_literal);
>>>> 
>>>> return true;
>>>> }
>>>> 
>>>> Modified: cfe/trunk/test/Lexer/c90.c
>>>> URL: 
>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Lexer/c90.c?rev=17
>>>> 3
>>>> 622&r1=173621&r2=173622&view=diff
>>>> =====================================================================
>>>> =
>>>> ========
>>>> --- cfe/trunk/test/Lexer/c90.c (original)
>>>> +++ cfe/trunk/test/Lexer/c90.c Sun Jan 27 14:12:04 2013
>>>> @@ -29,8 +29,8 @@ void test2() {
>>>> }
>>>> 
>>>> void test3() {
>>>> -  (void)L"\u1234";  // expected-error {{unicode escape sequences are 
>>>> only valid in C99 or C++}}
>>>> -  (void)L'\u1234';  // expected-error {{unicode escape sequences are 
>>>> only valid in C99 or C++}}
>>>> +  (void)L"\u1234";  // expected-error {{universal character names 
>>>> + are only valid in C99 or C++}}  (void)L'\u1234';  // expected-error 
>>>> + {{universal character names are only valid in C99 or C++}}
>>>> }
>>>> 
>>>> #define PREFIX(x) foo ## x
>>>> @@ -39,3 +39,5 @@ int test4() {
>>>> int *p = &PREFIX(0p+1);
>>>> return p[-1];
>>>> }
>>>> +
>>>> +#define MY_UCN \u00FC // expected-warning {{universal character 
>>>> +names are only valid in C99 or C++; treating as '\' followed by 
>>>> +identifier}}
>>>> 
>>>> 
>>>> _______________________________________________
>>>> cfe-commits mailing list
>>>> [email protected]
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>> ---------------------------------------------------------------------
>>>> Intel Israel (74) Limited
>>>> 
>>>> This e-mail and any attachments may contain confidential material for 
>>>> the sole use of the intended recipient(s). Any review or distribution 
>>>> by others is strictly prohibited. If you are not the intended 
>>>> recipient, please contact the sender and delete all copies.
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> Intel Israel (74) Limited
>>> 
>>> This e-mail and any attachments may contain confidential material for 
>>> the sole use of the intended recipient(s). Any review or distribution 
>>> by others is strictly prohibited. If you are not the intended 
>>> recipient, please contact the sender and delete all copies.
>>> 
>> 
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>> 
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>> 
> 

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to