> Certainly not every "strlen" has these semantics.  For example,
> this open-coded one doesn't:
> 
>    int len = 0;
>    for (int i = 0; s.a[i]; ++i)
>      ++len;
> 
> It computes 2 (with no warning for the out-of-bounds access).
> 

yes, which is questionable as well, but that happens only
if the source code accesses the array via s.a[i]
not if it happens to use char *, as this experiment shows:

$ cat y1.c
int len (const char *x)
{
   int len = 0;
   for (int i = 0; x[i]; ++i)
     ++len;
   return len;
}

const char a[3] = "123";

int main ()
{
   return len(a);
}

$ gcc -O3 y1.c
$  ./a.out ; echo $?
3

The loop is not optimized away.

$ cat y2.c
const char a[3] = "123";

int main ()
{
   int len = 0;
   for (int i = 0; a[i]; ++i)
     ++len;
   return len;
}

$ gcc -O3 y2.c
$ ./a.out ; echo $?
2


The point I make is that it is impossible to know where the function
is inlined, and if the original code can be broken in surprising ways.
And most importantly strlen is often used in security relevant ways.


> So if the standard doesn't guarantee it and different kinds
> of accesses behave differently, how do we explain what "works"
> and what doesn't without relying on GCC implementation details?
> 
> If we can't then the only language we have in common with users
> is the standard.  (This, by the way, is what the C memory model
> group is trying to address -- the language or feature that's
> missing from the standard that says when, if ever, these things
> might be valid.)

Sorry, but there are examples of undefined behaviour that GCC does
deliberately not use for code optimizations, but only for warnings.
I mean undefinedness of signed shift left overflow for instance.

I think the possible return value of strlen should be also not used
for code optimizations.

Because your optimization assumes the return value of strlen
is always in the range 0..size-1 even if the string is not nul terminated.
But that is the only value that can _never_ be returned if the string is
not nul terminated.  Therefore this is often used as check for
zero-termination. (*)

But in reality the return value is always in range size..infinity or
the function aborts, code like assert(strlen(x) < sizeof(x)) uses
this basic knowledge.  The standard should mention these magic
powers of strlen, and state that it will either abort or return >= sizeof(x).
It does not help anybody to be unclear.


(*): This is even done here:
__strcpy_chk (char *__restrict__ dest, const char *__restrict__ src,
              size_t slen)
{
  size_t len = strlen (src);
  if (len >= slen)
    __chk_fail ();
  return memcpy (dest, src, len + 1);
}

If you are right __chk_fail will never be called. So why not optimize
it away?


Bernd.

Reply via email to