Another flaw here is that if a UCN is not a valid identifier character, it gets read in a second time by LexTokenInternal, which means we get the warnings twice. I was trying not to have a NoWarn variant but maybe it's necessary.
Jordan On Jan 17, 2013, at 11:31 , Jordan Rose <[email protected]> wrote: > How about this approach? > - LexUnicode mirrors LexTokenInternal, dispatching to the proper lex method > based on the first Unicode character in a token. > - UCNs are validated in readUCN (called by LexTokenInternal and > LexIdentifier). The specific identifier restrictions are checked in > LexUnicode and LexIdentifier. > - UCNs are recomputed in Preprocessor::LookUpIdentifierInfo because we start > with the spelling info there, but all the validation has already happened. > > With these known flaws: > - the classification of characters in LexUnicode should be more efficient. > - poor recovery for a non-identifier UCN in an identifier. Right now I just > take that to mean "end of identifier", which is the most pedantically correct > thing to do, but it's probably not what's intended. > - still needs more tests, of course > > FWIW, though, I'm not sure unifying literal Unicode and UCNs is actually a > great idea. The case where it matters most (validation of identifier > characters) is pretty easy to separate out into a helper function (and indeed > it already is). The other cases (accepting Unicode whitespace and fixits for > accidental Unicode) only make sense for literal Unicode, not escaped Unicode. > > Anyway, what do you think? > Jordan > > <UCNs.patch> _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
