https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97604

            Bug ID: 97604
           Summary: Bad digit separators accepted in pp-numbers
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: rejects-valid
          Severity: normal
          Priority: P3
         Component: preprocessor
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jsm28 at gcc dot gnu.org
                CC: emsr at gcc dot gnu.org
  Target Milestone: ---

cpplib lexes pp-numbers in lex_number.  Following bug 64626 that includes some
logic to disallow a pp-number ending with C++ digit separators.  However, that
logic is insufficient to cover all cases where the lexing includes too many
characters in the pp-number.

Compile the following with -std=c++17:

int a = 0x0'e-0xe;

This gives a bogus error:

t.cc:1:9: error: unable to find numeric literal operator 'operator""-0xe'
    1 | int a = 0x0'e-0xe;
      |         ^~~~~~~~~
t.cc:1:9: note: use '-fext-numeric-literals' to enable more built-in suffixes

The pp-number syntax starts a pp-number with "digit" or ". digit" and then
allows various things to follow, one of which is "' nondigit" and another one
of which is "e sign".  The longest possible preprocessing token starting with
the first 0 in the above example is 0x0'e because the text preceding "e-" ends
with "'" and so is not a pp-number.  So 0x0'e is a preprocessing token,
followed by "-", and the above is in fact a subtraction of two separate integer
literals, i.e. valid C++ input.

"'" must only be accepted in a pp-number when followed by a digit or nondigit,
and if that nondigit is e, E, p or P, it terminates the pp-number if a sign
follows.  Although I haven't given examples here, you can probably construct
rejects-valid examples (ones involving macro expansion, at least) also for the
case of wrongly accepting a digit separator followed by a UCN / UTF-8 character
(an identifier-nondigit that is not a nondigit) or '.'.  The case of
consecutive digit separators shouldn't introduce rejects-valid bugs because ''
isn't valid at the start of a preprocessing token, but bug 83873 would be fixed
by following the syntax in lex_number and rejecting them there rather than
trying to catch them later.

Reply via email to