On 29 March 2012 22:10, Kenneth Zadeck <zad...@naturalbridge.com> wrote:
> This patch takes a different approach to fixing PR52543 than does the patch
> in
> http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00641.html
> This patch transforms the lower-subreg pass(es) from unconditionally
> splitting wide moves, zero extensions, and shifts, so that it now takes into
> account the target specific costs and only does the transformations if it is
> profitable.
> Unconditional splitting is a problem that not only occurs on the AVR but is
> also a problem on the ARM NEON and my private port.  Furthermore, it is a
> problem that is likely to occur on most modern larger machines since these
> machines are more likely to have fast instructions for moving things that
> are larger than word mode.

Nice - this means that atleast one pending patches for subreg
style operations for neon intrinsics can go in after appropriate tweaking.
of costs. It probably requires some tweaking and benchmarking on ARM, but
the case where we saw such spills to the stack with subreg style operations is
now much improved , indicating that the existing costs infrastructure
manages to get this right atleast for this case.

Richard(S) - If you remember your PR48941 patch - after applying the
lower-subreg patch I now see far better code and what one gets out of
-fno-split-wide-types but a lot of that gratuitous spillng has gone away.

There are still too many moves between Neon registers but there are
far less moves
to the integer side and the gratuitous spilling is now gone.

old on left - new on right ( i.e. Kenneth's patch + Richard's PR48941 patch

cross:                                                          cross:
        @ args = 0, pretend = 0, frame = 16                   |         @ args 
= 0, pretend = 0, frame = 0
        @ frame_needed = 1, uses_anonymous_args = 0           |         @ 
frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.                                @ link 
register save eliminated.
        str     fp, [sp, #-4]!                                |         vldmia  
r0, {d26-d27}
        add     fp, sp, #0                                    |         vldmia  
r1, {d24-d25}
        sub     sp, sp, #20                                   |         vmov    
q10, q13  @ v4sf
        vldmia  r0, {d16-d17}                                 |         vmov    
q11, q13  @ v4sf
        vmov    q10, q8  @ v4sf                               |         vmov    
q8, q12  @ v4sf
        sub     sp, sp, #48                                   |         vmov    
q9, q12  @ v4sf
                                                              >         vzip.32 
q10, q11
                                                              >         vzip.32 
q8, q9
                                                              >         vmov    
q14, q10  @ v4sf
        vmov    q12, q8  @ v4sf                                         vmov    
q12, q8  @ v4sf
        add     r3, sp, #15                                   |         vmov    
d21, d22  @ v2sf
        bic     r3, r3, #15                                   |         
vmul.f32        d16, d29, d18
        vzip.32 q10, q12                                      |         
vmul.f32        d17, d21, d24
        vstmia  r3, {d20-d21}                                 |         vmov    
d19, d18  @ v2sf
        vstr    d24, [r3, #16]                                |         
vmul.f32        d18, d28, d25
        vstr    d25, [r3, #24]                                |         
vmls.f32        d16, d21, d25
        vldmia  r1, {d16-d17}                                 |         
vmls.f32        d17, d28, d19
        vmov    q9, q8  @ v4sf                                |         
vmls.f32        d18, d29, d24
        vmov    q11, q8  @ v4sf                               |         vmov    
d26, d16  @ v2sf
        vzip.32 q9, q11                                       |         vmov    
d27, d17  @ v2sf
        vstmia  r3, {d18-d19}                                 |         vmov    
d17, d18  @ v2sf
        vstr    d22, [r3, #16]                                |         vuzp.32 
d26, d27
        vstr    d23, [r3, #24]                                |         vmov    
d16, d26  @ v2sf
        vmov    d25, d18  @ v2sf                              |         vmov    
r0, r1, d16  @ v4sf
        vmul.f32        d17, d21, d22                         |         vmov    
r2, r3, d17
        vmul.f32        d18, d24, d18                         <
        vmov    d16, d19  @ v2sf                              <
        vmul.f32        d19, d20, d19                         <
        vmls.f32        d17, d24, d16                         <
        vmls.f32        d18, d20, d22                         <
        vmls.f32        d19, d21, d25                         <
        vuzp.32 d17, d18                                      <
        vmov    d20, d17  @ v2sf                              <
        vmov    d21, d19  @ v2sf                              <
        vmov    r0, r1, d20  @ v4sf                           <
        vmov    r2, r3, d21                                   <
        add     sp, fp, #0                                    <
        ldmfd   sp!, {fp}                                     <
        bx      lr                                                      bx      

Reply via email to