On Thu, Aug 27, 2020 at 9:17 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> >On 2020-08-26 5:23 p.m., Roger Sayle wrote:
> >> These more accurate target rtx_costs are used by the
> >> gimple-ssa-strength-reduction.c (via a call to mult_by_coeff_cost) to
> >> decide whether applying strength reduction would be profitable.  This test 
> >> case, slsr-13.c, assumes that two multiplications by four are
> >> cheaper than two multiplications by five.   (I believe) This is not the 
> >> case on hppa which
> >> has a sh2add instruction, that performs a multiplication by five in
> >> one cycle, or exactly the same cost as performing a left shift by two
> >> (i.e. a multiplication by four).  Oddly, I also believe this isn't the
> >> case on x86_64, where the similar lea instruction is (sometimes) as 
> >> efficient as left shift by two bits.
> >This looks like a regression.
> >
> >gcc-10 (prepatch):
> >
> >        addl %r25,%r26,%r28
> >        sh2addl %r25,%r28,%r25
> >        sh2addl %r26,%r28,%r26
> >        addl %r26,%r28,%r28
> >        bv %r0(%r2)
> >        addl %r28,%r25,%r28
> >
> >  <bb 2> [local count: 1073741824]:
> >  x1_4 = c_2(D) + s_3(D);
> >  slsr_11 = s_3(D) * 4;
> >  x2_6 = x1_4 + slsr_11;
> >  slsr_12 = c_2(D) * 4;
> >  x3_8 = x1_4 + slsr_12;
> >  _1 = x1_4 + x2_6;
> >  x_9 = _1 + x3_8;
> >  return x_9;
> >
> >gcc-11 (with patch):
> >
> >        addl %r25,%r26,%r19
> >        sh2addl %r26,%r26,%r28
> >        addl %r28,%r25,%r28
> >        sh2addl %r25,%r25,%r25
> >        addl %r28,%r19,%r28
> >        addl %r25,%r26,%r26
> >        bv %r0(%r2)
> >        addl %r28,%r26,%r28
> >
> >  <bb 2> [local count: 1073741824]:
> >  x1_4 = c_2(D) + s_3(D);
> >  a2_5 = s_3(D) * 5;
> >  x2_6 = c_2(D) + a2_5;
> >  a3_7 = c_2(D) * 5;
> >  x3_8 = s_3(D) + a3_7;
> >  _1 = x1_4 + x2_6;
> >  x_9 = _1 + x3_8;
> >  return x_9;
> >
> > Regards,
> > Dave
>
> There are two interesting (tree-optimization) observations here.  The first 
> is that at the tree-ssa
> level both of these gimple sequences look to have exactly the same cost, 
> seven assignments on
> a target where *4 is the same cost as *5.  The gimple doesn't attempt to 
> model the sh?add/lea
> instructions that combine may find, so at RTL expansion both sequences look 
> equivalent.  One
> fix may be to have gimple-ssa-strength-reduction.c just prefer 
> multiplications by 2, 4 and 8,
> even on targets that have a single cycle "mul" instruction.
>
> The second observation is why isn't tree-ssa-reassoc.c doing something here.  
> The test case
> is evaluating (s+c)+(s+5*c)+(5*s+c), and this strength reduction test is 
> expecting this to turn
> into "tmp=s+c;  return tmp+(tmp+4*c)+(4*s+tmp" which is clever and an 
> improvement, but
> overlooks the obvious reassociation 7*(s+c).  Indeed LLVM does this in three 
> instructions:

reassoc doesn't work on signed types

>
>         tmp1 = s+c;
>         tmp2 = tmp1<<3;
>         return tmp2-tmp1;
>
> Although the PA backend is (mostly) innocent in this, the lowest impact 
> fix/work around is
> to have multiplications by 2, 4 and 8 return COSTS_N_INSNS(1)-1, to indicate 
> a preference
> when splitting ties.  I'll prepare a patch.
>
> Roger
> --
>
>

Reply via email to