On 08/01/2018 02:40 AM, Jakub Jelinek wrote:
On Wed, Aug 01, 2018 at 09:19:43AM +0200, Richard Biener wrote:
And if so, what makes it well defined?

The fact that strlen takes a char * argument and thus inline-expansion
of a trivial implementation like

 int len = 0;
 for (; *p; ++p)
   ++len;

will have

 p = &s.a;

and the middle-end doesn't reconstruct s.a[..] from the pointer
access.

Some testcases:
struct S { char a[3]; char b[5]; } s[3] = { { "ab", "defg" }, { "h", "klmn" }, { "opq", 
"rstu" } };

__SIZE_TYPE__
foo (int i, int a)
{
  volatile char *p = (volatile char *) &s[i].a;
  if (a)
    p = (volatile char *) &s[i];
  return __builtin_strlen ((char *) p);
}

If I call this with foo (2, 1), do you still claim it is not valid C?

String functions like strlen operate on character strings stored
in character arrays.  Calling strlen (&s[1]) is invalid because
&s[1] is not the address of a character array.  The fact that
objects can be represented as arrays of bytes doesn't change
that.  The standard may be somewhat loose with words on this
distinction but the intent certainly isn't for strlen to traverse
arbitrary sequences of bytes that cross subobject boundaries.
(That is the intent behind the raw memory functions, but
the current text doesn't make the distinction clear.)

I don't see in C anything that would say that this is not valid for strlen,
but valid for memcpy, and if you say it is not valid even for memcpy, then
pretty much nothing will work, we need memcpy to be able to copy whole objects
containing subobjects.

Yes,  I understand the concern, and it is something for
the standard to reconcile.  As I said, the object model study
group is considering how to reflect this in the model.  One
suggestion was to treat unsigned char (and only it) special.
Another was to introduce a new type, byte, like C++ has, and
treat it special.  Yet another is to come up with a special
cast to change the provenance of a pointer.  There probably
will be others.

 The middle-end optimizes the above into
  p_3 = &s[i_2(D)].a;
  _7 = __builtin_strlen (p_3);
(in fre3 in particular).

Or:
int bar (char *) __attribute__((pure));
int v;

__SIZE_TYPE__
baz (int i, int a)
{
  int b = bar (&s[i].a[0]);
  __SIZE_TYPE__ t = __builtin_strlen ((char *) &s[i]);
  v = b;
  return t;
}

where bar may say just int bar (char *p) { return *p; }
The middle-end optimizes this into:
  _1 = &s[i_3(D)].a[0];
  b_5 = bar (_1);
  t_6 = __builtin_strlen (_1);

That is why for __builtin_object_size (..., [13]) where we want to
differentiate between that we need to handle it very early, because later on
the distinction is gone.

Yep.  It would be nice to be able to hold on to the type.
In this case it doesn't look like it's a problem since &s[i]
is also the address of the first member of struct S and
those two need to be interchangeable (this also might need
tweaking in the standard).

Martin

Reply via email to