[Bug preprocessor/109877] is `10.1.0` a valid token sequence?

iains at gcc dot gnu.org via Gcc-bugs Wed, 13 May 2026 14:51:08 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109877


--- Comment #19 from Iain Sandoe <iains at gcc dot gnu.org> ---
(In reply to Joseph S. Myers from comment #18)
> Preprocessing tokens are not context-sensitive in the language-level grammar
> for C (in the preprocessing grammar, header-name preprocessing tokens can
> only occur in certain contexts in directives and those sequences of
> characters are lexed differently outside those contexts). They are lexed
> using the greedy rule that the longest possible sequence of characters that
> can form the next preprocessing token or comment does so, regardless of the
> impact on subsequent parsing. The possible confusing nature of preprocessing
> numbers such as 0x74ae-0x4000 is explained in trouble.texi.

Is this formally specified in the standard?

(it would be useful to have something to point to if the clang implementation
is non-conforming - even though it is somewhat academic in terms of released
code).

> My impression is that you don't want context-sensitive rules for determining
> preprocessing tokens here, you want context-sensitive rules for whether it's
> valid to convert a particular preprocessing number from a preprocessing
> token to a token (that is, in certain places inside attributes, you want the
> preprocessing token 10.1.0 to be converted to some new kind of token, not a
> floating literal, rather than resulting in an error because it doesn't
> convert to a valid token).

Yes, that is what my current hack does - which works for the C front end
because it is possible to elect to switch the tokeniser to tokenise that entity
as a string.

For the C++ front end it is considerably harder, since the lexing is done
before we have any context to make such a switch.

[Bug preprocessor/109877] is `10.1.0` a valid token sequence?

Reply via email to