Hi Walter, I faced with similar problem when I worked on optimizing memcpy expanding for x86. x86-specific expander also needed alignment info and it was also incorrect (i.e. too conservative). Routine get_mem_align_offset () is used there to determine alignment, but after some moment it started to return 1-byte instead of 16-byte or whatever alignment, which I expected. I made small fix for it and it seemed to work well again: diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index 9565c61..9108022 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -1516,6 +1516,14 @@ get_mem_align_offset (rtx mem, unsigned int align) if (TYPE_ALIGN (TREE_TYPE (expr)) < (unsigned int) align) return -1; } + else if (TREE_CODE (expr) == MEM_REF) + { + int al, off; + get_object_alignment_1 (expr, &al, &offset); + offset /= BITS_PER_UNIT; + if (al < align) + return -1; + } else if (TREE_CODE (expr) == COMPONENT_REF)
So, returning to your problem - probably routines you mentioned also don't handle MEM_REF (and before some commit they didn't have to). Also, you could look into routine I mentioned - probably you could find something useful for you there. --- Thanks, Michael On 2 October 2012 18:19, Walter Lee <w...@tilera.com> wrote: > > On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in > gcc 4.6 and later versus gcc 4.4. Though I find the problem on > TILE-Gx, I think this is a problem for any architectures with > SLOW_UNALIGNED_ACCESS set to 1. > > Consider the following program: > > struct foo { > int x; > }; > > void copy(struct foo* f0, struct foo* f1) > { > memcpy (f0, f1, sizeof(struct foo)); > } > > In gcc 4.4, I get the desired inline memcpy: > > copy: > ld4s r1, r1 > st4 r0, r1 > jrp lr > > In gcc 4.7, however, I get inlined byte-by-byte copies: > > copy: > ld1u_add r10, r1, 1 > st1_add r0, r10, 1 > ld1u_add r10, r1, 1 > st1_add r0, r10, 1 > ld1u_add r10, r1, 1 > st1_add r0, r10, 1 > ld1u r10, r1 > st1 r0, r10 > jrp lr > > The inlining of memcpy is done in expand_builtin_memcpy in builtins.c. > Tracing through that, I see that the alignment of src_align and > dest_align, which is computed by get_pointer_alignment, has degraded: > in gcc 4.4 they are 32 bits, but in gcc 4.7 they are 8 bits. This > causes the loads generated by the inlined memcopy to be per-byte > instead of per-4-byte. > > Looking further, gcc 4.7 uses the "align" field in "struct > ptr_info_def" to compute the alignment. This field appears to be > initialized in get_ptr_info in tree-ssanames.c but it is always > initialized to 1 byte and does not appear to change. gcc 4.4 computes > its alignment information differently. > > I get the same byte-copies with gcc 4.8 and gcc 4.6. > > I see a couple related open PRs: 50417, 53535, but no suggested fixes > for them yet. Can anyone advise on how this can be fixed? Should I > file a new bug, or add this info to one of the existing PRs? > > Thanks, > > Walter > -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.