Re: [PATCH] Make strlen range computations more conservative

Martin Sebor Mon, 06 Aug 2018 10:16:07 -0700

These examples do not aim to be valid C, they just point out limitations
of the middle-end design, and a good deal of the problems are due
to trying to do things that are not safe within the boundaries given
by the middle-end design.

I really think this is important -- and as such I think we need to move
away from trying to describe scenarios in C because doing so keeps
bringing us back to the "C doesn't allow XYZ" kinds of arguments when
what we're really discussing are GIMPLE semantic issues.


So examples should be GIMPLE.  You might start with (possibly invalid) C
code to generate the GIMPLE, but the actual discussion needs to be
looking at GIMPLE.  We might include the C code in case someone wants to
look at things in a debugger, but bringing the focus to GIMPLE is really
important here.


I don't understand the goal of this exercise.  Unless the GIMPLE
code is the result of a valid test case (in some language GCC
supports), what does it matter what it looks like?  The basis of
every single transformation done by a compiler is that the source
code is correct.  If it isn't then all bets are off.  I'm no GIMPLE
expert but even I can come up with any number of GIMPLE expressions
that have undefined behavior.  What would that prove?

But let me try anyway.  Here's a simplified (and gimplified) version
of the test case that started this debate:

  struct S { char a[4], b; };

  f (struct S * p)
  {
    int D.1908;

    _1 = &p->a;
    _2 = __builtin_strlen (_1);   // strlen (p->a);
    D.1908 = (int) _2;
    return D.1908;
  }

and one involving a pointer:

  g (struct S * p)
  {
    int D.1910;
    char * q;

    q = &p->a;
    _1 = __builtin_strlen (q);
    D.1910 = (int) _1;
    return D.1910;
  }

and another one involving a pointer and strcpy and
_FORTIFY_SOURCE:

  h (struct S * p)
  {
    int D.2208;
    char * q;

    q = &p->a;
    _1 = strcpy (q, "1234");
    _2 = (long int) _1;
    D.2208 = (int) _2;
    return D.2208;
  }

with strcpy defined as:

  __attribute__((artificial, gnu_inline, always_inline, leaf, nothrow))
  strcpy (char * restrict __dest, const char * restrict __src)
  {
    char * D.2210;

    _1 = __builtin_object_size (__dest, 1);
    D.2210 = __builtin___strcpy_chk (__dest, __src, _1);
    return D.2210;
  }

What does this show?

AFAICS, all three functions are equivalent GIMPLE, yet I'm being
told that the first one is different in some important detail from
the second, and that even though it's the same as the third and
even though it's good to have __strcpy_chk() abort in the third
case it's bad for the strlen() call to return a value constrained
to [0, 3].  Would defining strlen like so

  __attribute__((artificial, gnu_inline, always_inline, leaf, nothrow))
  strlen (const char * __src)
  {
    char * D.2210;

    _1 = __builtin_object_size (__src, 1);
    D.2210 = __builtin___strlen_chk (__src, _1);
    return D.2210;
  }

and having __strlen_chk() abort if __strc were longer than _1
be also bad?  (If not -- I sincerely hope that's the answer --
then I'll be happy to put together a patch for that.  In fact,
I think it would be useful to extend this to all string
functions (i.e., have them all abort on reads past the end,
just as they abort on writes).

Martin

Re: [PATCH] Make strlen range computations more conservative

Reply via email to