One of the problems with ivopts is that the auto-increment modelling
just takes into account whether HAVE_PRE_INC and friends are defined
for the architecture. However on ARM the VFP addressing modes don't
really support PRE_INCREMENT and POST_DECREMENT forms and hence there
is a bias in ivopts to prefer pre-increment forms over all-else. The
attached patch attempts to fix this - in general it makes things
better on ARM where a large number of cases where we have rather
embarassing code generation around array accesses of floating point
values where to honor this choice of auto-increment forms the compiler
is forced to move things back and forth between floating point and
integer registers and all other such cases.

The canonical example for this is

 void foo (float *x , float *y, float *z, float *m, int l)
      int i;
      for (i = 0; i < l ; i++)
        z[i] = x[i] * y[i] + m[i];

 sub r0, r0, #4
 sub r1, r1, #4
 sub r3, r3, #4
 add ip, r2, ip, asl #2
 add r3, r3, #4
 add r0, r0, #4
 flds s15, [r3, #0]
 flds s13, [r0, #0]
 add r1, r1, #4
 flds s14, [r1, #0]
 fmacs s15, s13, s14
 mov r4, r3
 fstmias r2!, {s15}
 cmp r2, ip
 bne .L3
 ldmfd sp!, {r4}
 bx lr

and after we generate :

 @ args = 4, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 ldr ip, [sp, #0]
 cmp ip, #0
 bxle lr
 add ip, r0, ip, asl #2
 fldmias r0!, {s13}
 fldmias r1!, {s14}
 fldmias r3!, {s15}
 fmacs s15, s13, s14
 cmp r0, ip
 fstmias r2!, {s15}
 bne .L3
 bx lr

In general , ivopts could do with some TLC in this area - looking at
the code generated for most of SPEC2k, I see a general improvement in
performance on an A9 board with a large number of cases of transfers
back and forth between VFP and integer registers much reduced (in one
case mgrid I saw up to a 6% improvement in performance in mgrid , 3%
in facerec) and overall upto a 1% improvement when this patch was
applied to the Linaro 4.6 tree - looking at object files with the same
patch applied on FSF trunk I see similar transformations as the 4.6
tree. I see some funny behaviour with twolf where there is noise in
the results and I'm not confident of that particular result -

In the interest of full disclosure here while looking at mgrid I
noticed a few cases where we were moving values more from integer to
the VFP side but overall I think this patch benefits more than harms .
These appeared to be around the areas where a floating point array was
being zero initialized. Given the VFP instruction set doesn't really
have a zero initializer form we were moving the value 0 into integer
registers, moving the value into a VFP register rather than just
choosing the integer side register store - I am not yet sure why that
is happening and that's somethiing I'm investigating. Before that , I
wanted some feedback on this patch as it stands today as I believe
it's reached a stage where it appears to be performing reasonably

I did experiment with costs and in general trying to turn off these
auto-increment forms for the FP modes when we are not in soft-float
mode but nothing appeared to behave as well as this attached patch.

Thoughts and comments would be welcome. I don't know of any other
architectures where this will be applicable.



        * tree-ssa-loop-ivopts.c (add_autoinc_candidates, get_address_cost):
        Replace use of HAVE_{POST/PRE}_{INCREMENT/DECREMENT} with
        * config/arm/arm.h (ARM_AUTOINC_VALID_FOR_MODE_P): New.
        (USE_LOAD_POST_INCREMENT): Define.
        (USE_LOAD_PRE_INCREMENT): Define.
        (USE_LOAD_POST_DECREMENT): Define.
        (USE_LOAD_PRE_DECREMENT): Define.
        (USE_STORE_PRE_DECREMENT): Define.
        (USE_STORE_PRE_INCREMENT): Define.
        (ARM_POST_INC): Define.
        (ARM_PRE_INC): Define.
        (ARM_PRE_DEC): Define.
        (ARM_POST_DEC): Define.
        * config/arm/arm-protos.h (arm_autoinc_modes_ok_p): Declare.
        * config/arm/arm.c (arm_autoinc_modes_ok_p): Define.

Reply via email to