On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in
gcc 4.6 and later versus gcc 4.4.  Though I find the problem on
TILE-Gx, I think this is a problem for any architectures with
SLOW_UNALIGNED_ACCESS set to 1.

Consider the following program:

struct foo {
  int x;
};

void copy(struct foo* f0, struct foo* f1)
{
  memcpy (f0, f1, sizeof(struct foo));
}

In gcc 4.4, I get the desired inline memcpy:

copy:
        ld4s    r1, r1
        st4     r0, r1
        jrp     lr

In gcc 4.7, however, I get inlined byte-by-byte copies:

copy:
        ld1u_add r10, r1, 1
        st1_add  r0, r10, 1
        ld1u_add r10, r1, 1
        st1_add  r0, r10, 1
        ld1u_add r10, r1, 1
        st1_add  r0, r10, 1
        ld1u     r10, r1
        st1      r0, r10
        jrp      lr

The inlining of memcpy is done in expand_builtin_memcpy in builtins.c.
Tracing through that, I see that the alignment of src_align and
dest_align, which is computed by get_pointer_alignment, has degraded:
in gcc 4.4 they are 32 bits, but in gcc 4.7 they are 8 bits.  This
causes the loads generated by the inlined memcopy to be per-byte
instead of per-4-byte.

Looking further, gcc 4.7 uses the "align" field in "struct
ptr_info_def" to compute the alignment.  This field appears to be
initialized in get_ptr_info in tree-ssanames.c but it is always
initialized to 1 byte and does not appear to change.  gcc 4.4 computes
its alignment information differently.

I get the same byte-copies with gcc 4.8 and gcc 4.6.

I see a couple related open PRs: 50417, 53535, but no suggested fixes
for them yet.  Can anyone advise on how this can be fixed?  Should I
file a new bug, or add this info to one of the existing PRs?

Thanks,

Walter

Reply via email to