On Jan 24, 2013, at 13:34 , Dmitri Gribenko <[email protected]> wrote:

> On Thu, Jan 24, 2013 at 10:50 PM, Jordan Rose <[email protected]> wrote:
>> Author: jrose
>> Date: Thu Jan 24 14:50:46 2013
>> New Revision: 173369
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=173369&view=rev
>> Log:
>> Handle universal character names and Unicode characters outside of literals.
>> 
>> This is a missing piece for C99 conformance.
>> 
>> This patch handles UCNs by adding a '\\' case to LexTokenInternal and
>> LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN.
>> If the UCN is not syntactically well-formed, we fall back to the old
>> treatment: a backslash followed by an identifier beginning with 'u' (or 'U').
>> 
>> Because the spelling of an identifier with UCNs still has the UCN in it, we
>> need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo.
>> 
>> Of course, valid code that does *not* use UCNs will see only a very minimal
>> performance hit (checks after each identifier for non-ASCII characters,
>> checks when converting raw_identifiers to identifiers that they do not
>> contain UCNs, and checks when getting the spelling of an identifier that it
>> does not contain a UCN).
>> 
>> This patch also adds basic support for actual UTF-8 in the source. This is
>> treated almost exactly the same as UCNs except that we consider stray
>> Unicode characters to be mistakes and offer a fixit to remove them.


>> +    // Instead of letting the parser complain about the unknown token,
>> +    // just warn that we don't have valid UTF-8, then drop the character.
> 
> The comment says 'just warn', but we throw an error here:
> 
>> +    if (!isLexingRawMode())
>> +      Diag(CurPtr, diag::err_invalid_utf8);


Yup. We're allowed to do this one because we get to map non-ASCII characters 
down to ASCII however we want, and we can map them to an invalid ASCII 
character. At least, that was my understanding of Richard's comments.
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to