Another flaw here is that if a UCN is not a valid identifier character, it gets 
read in a second time by LexTokenInternal, which means we get the warnings 
twice. I was trying not to have a NoWarn variant but maybe it's necessary.

Jordan


On Jan 17, 2013, at 11:31 , Jordan Rose <[email protected]> wrote:

> How about this approach?
> - LexUnicode mirrors LexTokenInternal, dispatching to the proper lex method 
> based on the first Unicode character in a token.
> - UCNs are validated in readUCN (called by LexTokenInternal and 
> LexIdentifier). The specific identifier restrictions are checked in 
> LexUnicode and LexIdentifier.
> - UCNs are recomputed in Preprocessor::LookUpIdentifierInfo because we start 
> with the spelling info there, but all the validation has already happened.
> 
> With these known flaws:
> - the classification of characters in LexUnicode should be more efficient.
> - poor recovery for a non-identifier UCN in an identifier. Right now I just 
> take that to mean "end of identifier", which is the most pedantically correct 
> thing to do, but it's probably not what's intended.
> - still needs more tests, of course
> 
> FWIW, though, I'm not sure unifying literal Unicode and UCNs is actually a 
> great idea. The case where it matters most (validation of identifier 
> characters) is pretty easy to separate out into a helper function (and indeed 
> it already is). The other cases (accepting Unicode whitespace and fixits for 
> accidental Unicode) only make sense for literal Unicode, not escaped Unicode.
> 
> Anyway, what do you think?
> Jordan
> 
> <UCNs.patch>

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to