On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in
gcc 4.6 and later versus gcc 4.4. Though I find the problem on
TILE-Gx, I think this is a problem for any architectures with
SLOW_UNALIGNED_ACCESS set to 1.
Consider the following program:
struct foo {
int x;
};
void copy(struct foo* f0, struct foo* f1)
{
memcpy (f0, f1, sizeof(struct foo));
}
In gcc 4.4, I get the desired inline memcpy:
copy:
ld4s r1, r1
st4 r0, r1
jrp lr
In gcc 4.7, however, I get inlined byte-by-byte copies:
copy:
ld1u_add r10, r1, 1
st1_add r0, r10, 1
ld1u_add r10, r1, 1
st1_add r0, r10, 1
ld1u_add r10, r1, 1
st1_add r0, r10, 1
ld1u r10, r1
st1 r0, r10
jrp lr
The inlining of memcpy is done in expand_builtin_memcpy in builtins.c.
Tracing through that, I see that the alignment of src_align and
dest_align, which is computed by get_pointer_alignment, has degraded:
in gcc 4.4 they are 32 bits, but in gcc 4.7 they are 8 bits. This
causes the loads generated by the inlined memcopy to be per-byte
instead of per-4-byte.
Looking further, gcc 4.7 uses the "align" field in "struct
ptr_info_def" to compute the alignment. This field appears to be
initialized in get_ptr_info in tree-ssanames.c but it is always
initialized to 1 byte and does not appear to change. gcc 4.4 computes
its alignment information differently.
I get the same byte-copies with gcc 4.8 and gcc 4.6.
I see a couple related open PRs: 50417, 53535, but no suggested fixes
for them yet. Can anyone advise on how this can be fixed? Should I
file a new bug, or add this info to one of the existing PRs?
Thanks,
Walter