Re: [PATCH] LoongArch: Increase division costs

Xi Ruoyao Wed, 27 Mar 2024 05:42:57 -0700

On Wed, 2024-03-27 at 18:39 +0800, Xi Ruoyao wrote:
> On Wed, 2024-03-27 at 10:38 +0800, chenglulu wrote:
> > 
> > 在 2024/3/26 下午5:48, Xi Ruoyao 写道:
> > > The latency of LA464 and LA664 division instructions depends on the
> > > input.  When I updated the costs in r14-6642, I unintentionally set the
> > > division costs to the best-case latency (when the first operand is 0).
> > > Per a recent discussion [1] we should use "something sensible" instead
> > > of it.
> > > 
> > > Use the average of the minimum and maximum latency observed instead.
> > > This enables multiplication to reciprocal sequence reduction and speeds
> > > up the following test case for about 30%:
> > > 
> > >      int
> > >      main (void)
> > >      {
> > >        unsigned long stat = 0xdeadbeef;
> > >        for (int i = 0; i < 100000000; i++)
> > >          stat = (stat * stat + stat * 114514 + 1919810) % 1000000007;
> > >        asm(""::"r"(stat));
> > >      }
> > > 
> > > [1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648348.html
> > 
> > The test case div-const-reduction.c is modified to assemble the instruction
> > sequence as follows:
> >     lu12i.w $r12,999997440>>12                      # 0x3b9ac000
> >     ori     $r12,$r12,2567
> >     mod.w   $r13,$r13,$r12
> > 
> > This sequence of instructions takes 5 clock cycles.


It actually may take 5 to 8 cycles depending on the input.  And
multiplication is fully pipelined while division is not, so the
reciprocal sequence should still produce a better throughput.

> Hmm indeed, it seems a waste to do this reduction for int / 1000000007.
> I'll try to make a better heuristic as Richard suggests...

Oops, it seems impossible (w/o refactoring the generic code).  See my
reply to Richi :(.

Can you also try benchmarking with the costs of SI and DI division
increased to (10, 10) instead of (14, 22) - allowing more CSE but not
reciprocal sequence reduction, and (10, 22) - only allowing reduction
for DI but not SI?

-- 
Xi Ruoyao <xry...@xry111.site>
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] LoongArch: Increase division costs

Reply via email to