https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109877
--- Comment #19 from Iain Sandoe <iains at gcc dot gnu.org> --- (In reply to Joseph S. Myers from comment #18) > Preprocessing tokens are not context-sensitive in the language-level grammar > for C (in the preprocessing grammar, header-name preprocessing tokens can > only occur in certain contexts in directives and those sequences of > characters are lexed differently outside those contexts). They are lexed > using the greedy rule that the longest possible sequence of characters that > can form the next preprocessing token or comment does so, regardless of the > impact on subsequent parsing. The possible confusing nature of preprocessing > numbers such as 0x74ae-0x4000 is explained in trouble.texi. Is this formally specified in the standard? (it would be useful to have something to point to if the clang implementation is non-conforming - even though it is somewhat academic in terms of released code). > My impression is that you don't want context-sensitive rules for determining > preprocessing tokens here, you want context-sensitive rules for whether it's > valid to convert a particular preprocessing number from a preprocessing > token to a token (that is, in certain places inside attributes, you want the > preprocessing token 10.1.0 to be converted to some new kind of token, not a > floating literal, rather than resulting in an error because it doesn't > convert to a valid token). Yes, that is what my current hack does - which works for the C front end because it is possible to elect to switch the tokeniser to tokenise that entity as a string. For the C++ front end it is considerably harder, since the lexing is done before we have any context to make such a switch.
