https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86259

--- Comment #17 from Martin Sebor <msebor at gcc dot gnu.org> ---
> Let's give the struct a name and introduce some casts
> 
> typedef struct { int a; int b; } S;
> S s, s2;
> memcpy ((S*)&s2.a, (S*)&s.a, sizeof(s));
> 
> The standard makes it valid to convert &s.a to S* and makes it equivalent to
> &s. Because of the call to memcpy, it gets converted to void* afterwards. So
> you are saying that (void*)&s.a is not the same as (void*)(S*)&s.a (note
> that no arithmetic is involved at this point). This looks like it is going
> to cause trouble...

That code like this exists out there is one of the reasons why GCC allows
memcpy() to cross subobject boundaries, so there is no trouble there(*).  That
the code is invalid shouldn't be surprising: replacing the members with
one-element arrays (as C treats singleton objects) and replacing the memcpy
with direct assignments to the array elememts has triggered a -Warray-bounds
since GCC 4.6:

  typedef struct { int a[1], b[1]; } S;
  S s, s2;

  void f (void)
  {
    s2.a[1] = s.a[1];   // warning: array subscript 1 is above array bounds of
‘int[1]’ [-Warray-bounds]
  }

But this report (or the related 86265) isn't about memcpy.  It's about strlen()
accessing elements of multiple consecutive subobjects.  As I showed, using
string functions to cross subobject boundaries has been diagnosed by GCC since
2005 (GCC 4.3) and has affected the generated code with _FORTIFY_SOURCE since
then.  Whenever possible (and except for the memxxx functions), GCC diagnoses
not just writes but also reads that cross subobject boundaries, either by
-Wstringop-overflow (for built-in functions) or by -Warray-bounds (for direct
accesses and for some uses of built-ins).  It doesn't diagnose such invalid
reads by strlen() yet but starting with GCC 8 the -Wstringop-truncation warning
makes an effort to catch some before they happen.  GCC 9 will try to detect
some such reads even by strlen and strnlen (see for example bug 86199).

The appropriate mechanism for accessing memory across subobject boundaries and
irrespective of object types are the raw memory functions like memcpy and
memchr (with memchr(..., 0, ...) being the function to use to compute the same
result as strlen).  I would recommend respecting subobject boundaries and
starting with the address of the enclosing object when using those as well but
for now GCC does not diagnose violations.  Uses of all other (typed) functions
should respect object sizes and types.

[*] I'd like to see GCC start diagnosing questionable code like that to drive
code changes that will make it easier to do better analysis, detect more bugs,
and ultimately emit even more efficient object code.  It is one of the goals of
the effort by Peter Sewell and others to clarify C object model to make this
possible.

Reply via email to