Re: [PATCH] COBOL 6/15 156K lex: lexer

David Malcolm Mon, 17 Feb 2025 12:29:22 -0800

On Mon, 2025-02-17 at 12:29 -0500, James K. Lowden wrote:
> On Sat, 15 Feb 2025 23:32:37 -0500
> David Malcolm <dmalc...@redhat.com> wrote:
> 
> In defense of lack of free(3) ...
> 
> > > +const char *
> > > +esc( size_t len, const char input[] ) {
> > > +  static char spaces[] = "([,;]?[[:space:]])+";
> > > +  static char spaceD[] = "(\n {6}D" "|" "[,;]?[[:space:]])+";
> > > +  static char buffer[64 * 1024];
> > > +  char *p = buffer;
> > > +  const char *eoinput = input + len;
> > > +
> > > +  const char *spacex = is_reference_format()? spaceD : spaces;
> > > +
> > > +  for( const char *s=input; *s && s < eoinput; s++ ) {
> > > +    *p = '\0';
> > > +    gcc_assert( size_t(p - buffer) < sizeof(buffer) - 4 );
> 
> overflow guarded here


...but only via gcc_assert.

gcc/system.h has:

/* Use gcc_assert(EXPR) to test invariants.  */
#if ENABLE_ASSERT_CHECKING
#define gcc_assert(EXPR)                                                \
   ((void)(!(EXPR) ? fancy_abort (__FILE__, __LINE__, __FUNCTION__), 0 : 0))
#elif (GCC_VERSION >= 4005)
#define gcc_assert(EXPR)                                                \
  ((void)(UNLIKELY (!(EXPR)) ? __builtin_unreachable (), 0 : 0))
#else
/* Include EXPR, so that unused variable warnings do not occur.  */
#define gcc_assert(EXPR) ((void)(0 && (EXPR)))
#endif

so if ENABLE_ASSERT_CHECKING is false, that check gets preprocessed
away.

It turns out that (in gcc/configure.ac) that with --enable-
checking=release (which is soon to become the default for gcc 15), that
ENABLE_ASSERT_CHECKING is still true, but it is possible to configure
gcc with it false (e.g. with --enable-checking=none).

We shouldn't rely on assert to do checking of user-controllable input;
it should always be checked.

> 
> > > +    switch(*s) {
> > > +    case '^': case '$':
> > > +    case '(': case ')':
> > > +    case '*': case '+': case '?':
> > > +    case '[': case ']':
> > > +    case '{': case '}':
> > > +    case '|':
> > > +    case '.':
> > > +      *p++ = '\\';
> > > +      *p++ = *s;
> > > +      break;
> > > +    case '\\':
> > > +      *p++ = '[';
> > > +      *p++ = *s;
> > > +      *p++ = ']';
> > > +      break;
> > > +
> > > +    case ';': case ',':
> > > +      if( ! (s+1 < eoinput && s[1] == 0x20) ) {
> > > +        *p++ = *s;
> > > +        break;
> > > +      }
> > > +      __attribute__((fallthrough));
> > > +    case 0x20: case '\n':
> > > +      gcc_assert(p + sizeof(spacex) < buffer + sizeof(buffer));
> 
> and overflow guarded here, the only place where more than 4
> characters can be inserted into the buffer.  

Likewise.
> 
> > > +      p = stpcpy( p, spacex );
> > > +      while( s+1 < eoinput && is_separator_space(s+1)) {
> > > +        s++;
> > > +      }
> > > +      break;
> > > +    default:
> > > +      *p++ = *s;
> > > +      break;
> > > +    }
> > > +  }
> > > +  *p = '\0';
> ...
> > > +  return xstrdup(buffer);
> > > +}
> > 
> > Has a fixed size 64k buffer; doesn't seem to have proper overflow
> > handling. Could use a pretty_printer to accumulate chars.
> 
> Thank you for these comments.  Let me see if I can alleviate your
> concerns.  
> 
> This function is called from exactly one place, where the file-
> reader, lexio, parses a REPLACE directive.  The COBOL input says
> "REPLACE X BY Y" subject to some constraints.  Because we're using
> regex to find X, and because X might be any arbitrary string, the
> esc() function escapes regex metacharacters prior to executing the
> regex.  
> 
> IMHO it's unlikely the resulting regex input will exceed 64 KB.  It's
> unlikely to be even 64 bytes.  (Usually X is a COBOL identifier,
> limited to 64 bytes by ISO.)  Granted, a fixed maximum is a
> limitation.  But I put it to you: which is more likely?  For a regex
> to exceed 64 KB, or for heap allocation to fail to return adequate
> memory?  If there's some crazy input, I'd rather die at 64 KB than
> consume gigabytes of swap on the way to crashing.  

If you're going to have it die at 64KB, please make sure it *does* die
(hence my concerns about using gcc_assert for the bounds checking).

Hypothetical scenario: sysadmin sets up a service that accepts
arbitrary COBOL code to be compiled (such as Compiler Explorer).  Can
an attacker construct a not-necessarily valid input COBOL program that
overruns the on-stack buffer?  If so, they can probably use this to
inject code into the process, at which point they've got a beachhead to
attack the system further.  (Though IIRC Compiler Explorer already
sandboxes the compilers that are running there)

[...]

Dave

Re: [PATCH] COBOL 6/15 156K lex: lexer

Reply via email to