------- Additional Comments From pinskia at gcc dot gnu dot org 2005-01-19 23:38 ------- We now get: L33: lfd f13,0(r11) add r11,r11,r8 lfd f0,0(r10) addi r10,r10,8 fmadd f0,f13,f0,f12 fmr f12,f0 bdnz L33
Which is much better, thanks Zdenek. The only problem left looks a coalescing problem with out of ssa Before out of ssa: # ivtmp.89_54 = PHI <ivtmp.89_31(2), ivtmp.89_33(3)>; # lsm_tmp.85_52 = PHI <lsm_tmp.85_53(2), D.538_49(3)>; # k_3 = PHI <1(2), k_7(3)>; <L4>:; D.529_38 = k_3 * stride.10_5; D.530_39 = i_2 + D.529_38; D.531_40 = offset.11_9 + D.530_39; D.532_42 = (*a_41)[D.531_40]; b_30 = ivtmp.89_54; D.536_47 = *b_30; D.537_48 = D.532_42 * D.536_47; D.538_49 = D.537_48 + lsm_tmp.85_52; D.773_12 = (<unnamed type>) k_3; D.774_8 = D.773_12 + 1; k_7 = (int4) D.774_8; ivtmp.89_33 = ivtmp.89_54 + 8B; D.771_18 = (<unnamed type>) k_7; D.772_17 = D.771_18 + 4294967295; k_13 = (int4) D.772_17; if (stride.10_5 == k_13) goto <L25>; else goto <L4>; After: <L4>:; D.538 = (*a)[offset.11 + i + k * stride.10] * *ivtmp.89 + lsm_tmp.85; k = (int4) ((<unnamed type>) k + 1); ivtmp.89 = ivtmp.89 + 8B; lsm_tmp.85 = D.538; if (stride.10 == (int4) ((<unnamed type>) k + 4294967295)) goto <L25>; else goto <L4>; We should have coalesced lsm_tmp.85 and D.538 together. -- What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14741