On Fri, Nov 16, 2012 at 6:53 PM, Eli Friedman <[email protected]> wrote: > On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith <[email protected]> wrote: >> On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman <[email protected]> wrote: >>> Patch attached. Adds support universal character names in identifiers, >>> e.g.: >>> >>> char * \u00FC = "u-umlaut"; >>> >>> Not that it's particularly useful, but it's a longstanding hole in our >>> C99 support. >>> >>> The general outline of the approach is that the spelling of the >>> identifier token contains the UCN, but the IdentifierInfo for the >>> identifier token contains pure UTF-8. I think this is reasonable >>> given the C phases of translation, and consistent with the way we >>> handle UCNs in other contexts. >> >> This seems like a good approach to me. >> >>> I'm intentionally leaving out most of the support for universal >>> character names in user-defined literals, to try and reduce the size >>> of the patch. >> >> Index: include/clang/Lex/Lexer.h >> =================================================================== >> --- include/clang/Lex/Lexer.h (revision 168014) >> +++ include/clang/Lex/Lexer.h (working copy) >> @@ -573,6 +573,10 @@ >> void cutOffLexing() { BufferPtr = BufferEnd; } >> >> bool isHexaLiteral(const char *Start, const LangOptions &LangOpts); >> + >> + bool isUCNAfterSlash(const char *CurPtr, unsigned Size, unsigned >> SizeTmp[5]); >> + void ConsumeUCNAfterSlash(const char *&CurPtr, unsigned SizeTmp[5], >> + Token &Result); >> >> These [5]s should be [9]s. Also, how about wrapping the unsigned[9] in >> a struct so it doesn't have to be repeated in so many places, or at >> least passing it by reference so we'll get a compile error if the >> caller's array is the wrong size? >> >> Index: include/clang/Lex/Token.h >> =================================================================== >> --- include/clang/Lex/Token.h (revision 168014) >> +++ include/clang/Lex/Token.h (working copy) >> @@ -74,9 +74,10 @@ >> StartOfLine = 0x01, // At start of line or only after whitespace. >> LeadingSpace = 0x02, // Whitespace exists before this token. >> DisableExpand = 0x04, // This identifier may never be macro expanded. >> - NeedsCleaning = 0x08, // Contained an escaped newline or trigraph. >> + NeedsCleaning = 0x08, // Contained an escaped newline or trigraph. >> LeadingEmptyMacro = 0x10, // Empty macro exists before this token. >> - HasUDSuffix = 0x20 // This string or character literal has a >> ud-suffix. >> + HasUDSuffix = 0x20, // This string or character literal has a >> ud-suffix. >> + HasUCN = 0x40 // This identifier contains a UCN >> >> Missing full stop. ;-) >> >> The set of permitted characters appears to be correct only for C11 and >> C++11: it seems that C99 (+TR1,2,3) and C++98 (+TC1) permitted smaller >> sets (and not even the same smaller set!). C++98 used the list from >> ISO/IEC PDTR 10176 and C99 used ISO/IEC TR 10176:1998 (surprisingly, >> C++03 didn't move from the PDTR to the 1998 TR). If you're doing this >> to have a complete C99 (and C++98, modulo 'export') implementation, >> then maybe you care about this... :) > > I'll have to check whether I care about this. > >> + if (UCNIdentifierBuffer.empty() ? >> !isAllowedInitiallyIDChar(UcnVal) : >> + !isAllowedIDChar(UcnVal)) { >> + StringRef CurCharacter = CleanedStr.substr(i, NumChars); >> + Diag(Identifier, diag::err_ucn_invalid_in_id) << CurCharacter; >> >> It'd be nice for the diagnostic to be different for UCNs which can't >> appear at all versus UCNs which can't appear at the start of an >> identifier. >> >>> I know this patch is a little lacking in terms of tests, but I'm not >>> really sure what tests we need; suggestions welcome. >> >> UCNs which resolve to characters in the basic source character set. >> Identifier emission in diagnostics.
Oh, and here, it isn't entirely clear what to print. Is "error: redefinition of 'Đ'" or "error: redefinition of '\u1001'" better, given that at the point where we print the diagnostic, we don't know how it's written in the source? -Eli _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
