On Thu, Nov 15, 2012 at 8:30 PM, Richard Smith <[email protected]> wrote: > On Thu, Nov 15, 2012 at 7:17 PM, Eli Friedman <[email protected]> wrote: >> Patch attached. Adds support universal character names in identifiers, e.g.: >> >> char * \u00FC = "u-umlaut"; >> >> Not that it's particularly useful, but it's a longstanding hole in our >> C99 support. >> >> The general outline of the approach is that the spelling of the >> identifier token contains the UCN, but the IdentifierInfo for the >> identifier token contains pure UTF-8. I think this is reasonable >> given the C phases of translation, and consistent with the way we >> handle UCNs in other contexts. > > This seems like a good approach to me. > >> I'm intentionally leaving out most of the support for universal >> character names in user-defined literals, to try and reduce the size >> of the patch. > > Index: include/clang/Lex/Lexer.h > =================================================================== > --- include/clang/Lex/Lexer.h (revision 168014) > +++ include/clang/Lex/Lexer.h (working copy) > @@ -573,6 +573,10 @@ > void cutOffLexing() { BufferPtr = BufferEnd; } > > bool isHexaLiteral(const char *Start, const LangOptions &LangOpts); > + > + bool isUCNAfterSlash(const char *CurPtr, unsigned Size, unsigned > SizeTmp[5]); > + void ConsumeUCNAfterSlash(const char *&CurPtr, unsigned SizeTmp[5], > + Token &Result); > > These [5]s should be [9]s. Also, how about wrapping the unsigned[9] in > a struct so it doesn't have to be repeated in so many places, or at > least passing it by reference so we'll get a compile error if the > caller's array is the wrong size? > > Index: include/clang/Lex/Token.h > =================================================================== > --- include/clang/Lex/Token.h (revision 168014) > +++ include/clang/Lex/Token.h (working copy) > @@ -74,9 +74,10 @@ > StartOfLine = 0x01, // At start of line or only after whitespace. > LeadingSpace = 0x02, // Whitespace exists before this token. > DisableExpand = 0x04, // This identifier may never be macro expanded. > - NeedsCleaning = 0x08, // Contained an escaped newline or trigraph. > + NeedsCleaning = 0x08, // Contained an escaped newline or trigraph. > LeadingEmptyMacro = 0x10, // Empty macro exists before this token. > - HasUDSuffix = 0x20 // This string or character literal has a > ud-suffix. > + HasUDSuffix = 0x20, // This string or character literal has a > ud-suffix. > + HasUCN = 0x40 // This identifier contains a UCN > > Missing full stop. ;-) > > The set of permitted characters appears to be correct only for C11 and > C++11: it seems that C99 (+TR1,2,3) and C++98 (+TC1) permitted smaller > sets (and not even the same smaller set!). C++98 used the list from > ISO/IEC PDTR 10176 and C99 used ISO/IEC TR 10176:1998 (surprisingly, > C++03 didn't move from the PDTR to the 1998 TR). If you're doing this > to have a complete C99 (and C++98, modulo 'export') implementation, > then maybe you care about this... :) > > + if (UCNIdentifierBuffer.empty() ? > !isAllowedInitiallyIDChar(UcnVal) : > + !isAllowedIDChar(UcnVal)) { > + StringRef CurCharacter = CleanedStr.substr(i, NumChars); > + Diag(Identifier, diag::err_ucn_invalid_in_id) << CurCharacter; > > It'd be nice for the diagnostic to be different for UCNs which can't > appear at all versus UCNs which can't appear at the start of an > identifier. > >> I know this patch is a little lacking in terms of tests, but I'm not >> really sure what tests we need; suggestions welcome. > > UCNs which resolve to characters in the basic source character set. > Identifier emission in diagnostics. > Stringization of tokens containing UCNs. (If I'm reading this right, > we have a pre-existing bug here, in that characters outside the basic > source character set must be converted into UCNs in the resulting > string literal.) > ud-suffixes for integer and floating-point.
UCNs for punctuation (;, <, etc) http://gcc.gnu.org/bugzilla/show_bug.cgi?id=9449 mentions that '$' is special somehow (comment 21) > > Do you want to ExtWarn on this in C89? > _______________________________________________ > cfe-commits mailing list > [email protected] > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
