On 29 March 2012 22:10, Kenneth Zadeck <zad...@naturalbridge.com> wrote: > This patch takes a different approach to fixing PR52543 than does the patch > in > > http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00641.html > > This patch transforms the lower-subreg pass(es) from unconditionally > splitting wide moves, zero extensions, and shifts, so that it now takes into > account the target specific costs and only does the transformations if it is > profitable. > > Unconditional splitting is a problem that not only occurs on the AVR but is > also a problem on the ARM NEON and my private port. Furthermore, it is a > problem that is likely to occur on most modern larger machines since these > machines are more likely to have fast instructions for moving things that > are larger than word mode.
Nice - this means that atleast one pending patches for subreg style operations for neon intrinsics can go in after appropriate tweaking. of costs. It probably requires some tweaking and benchmarking on ARM, but the case where we saw such spills to the stack with subreg style operations is now much improved , indicating that the existing costs infrastructure manages to get this right atleast for this case. Richard(S) - If you remember your PR48941 patch - after applying the lower-subreg patch I now see far better code and what one gets out of -fno-split-wide-types but a lot of that gratuitous spillng has gone away. There are still too many moves between Neon registers but there are far less moves to the integer side and the gratuitous spilling is now gone. old on left - new on right ( i.e. Kenneth's patch + Richard's PR48941 patch http://patchwork.ozlabs.org/patch/130429/) regards Ramana
cross: cross: @ args = 0, pretend = 0, frame = 16 | @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 | @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. @ link register save eliminated. str fp, [sp, #-4]! | vldmia r0, {d26-d27} add fp, sp, #0 | vldmia r1, {d24-d25} sub sp, sp, #20 | vmov q10, q13 @ v4sf vldmia r0, {d16-d17} | vmov q11, q13 @ v4sf vmov q10, q8 @ v4sf | vmov q8, q12 @ v4sf sub sp, sp, #48 | vmov q9, q12 @ v4sf > vzip.32 q10, q11 > vzip.32 q8, q9 > vmov q14, q10 @ v4sf vmov q12, q8 @ v4sf vmov q12, q8 @ v4sf add r3, sp, #15 | vmov d21, d22 @ v2sf bic r3, r3, #15 | vmul.f32 d16, d29, d18 vzip.32 q10, q12 | vmul.f32 d17, d21, d24 vstmia r3, {d20-d21} | vmov d19, d18 @ v2sf vstr d24, [r3, #16] | vmul.f32 d18, d28, d25 vstr d25, [r3, #24] | vmls.f32 d16, d21, d25 vldmia r1, {d16-d17} | vmls.f32 d17, d28, d19 vmov q9, q8 @ v4sf | vmls.f32 d18, d29, d24 vmov q11, q8 @ v4sf | vmov d26, d16 @ v2sf vzip.32 q9, q11 | vmov d27, d17 @ v2sf vstmia r3, {d18-d19} | vmov d17, d18 @ v2sf vstr d22, [r3, #16] | vuzp.32 d26, d27 vstr d23, [r3, #24] | vmov d16, d26 @ v2sf vmov d25, d18 @ v2sf | vmov r0, r1, d16 @ v4sf vmul.f32 d17, d21, d22 | vmov r2, r3, d17 vmul.f32 d18, d24, d18 < vmov d16, d19 @ v2sf < vmul.f32 d19, d20, d19 < vmls.f32 d17, d24, d16 < vmls.f32 d18, d20, d22 < vmls.f32 d19, d21, d25 < vuzp.32 d17, d18 < vmov d20, d17 @ v2sf < vmov d21, d19 @ v2sf < vmov r0, r1, d20 @ v4sf < vmov r2, r3, d21 < add sp, fp, #0 < ldmfd sp!, {fp} < bx lr bx lr