On Wed, Mar 27, 2024 at 1:20 PM Xi Ruoyao <xry...@xry111.site> wrote:
>
> On Wed, 2024-03-27 at 08:54 +0100, Richard Biener wrote:
> > On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao <xry...@xry111.site> wrote:
> > >
> > > The latency of LA464 and LA664 division instructions depends on the
> > > input.  When I updated the costs in r14-6642, I unintentionally set the
> > > division costs to the best-case latency (when the first operand is 0).
> > > Per a recent discussion [1] we should use "something sensible" instead
> > > of it.
> > >
> > > Use the average of the minimum and maximum latency observed instead.
> > > This enables multiplication to reciprocal sequence reduction and speeds
> > > up the following test case for about 30%:
> > >
> > >     int
> > >     main (void)
> > >     {
> > >       unsigned long stat = 0xdeadbeef;
> > >       for (int i = 0; i < 100000000; i++)
> > >         stat = (stat * stat + stat * 114514 + 1919810) % 1000000007;
> > >       asm(""::"r"(stat));
> > >     }
> >
> > I think you should be able to see a constant divisor and thus could do
> > better than return the same latency for everything.  For non-constant
> > divisors using the best-case latency shouldn't be a problem.
>
> Hmm, it seems not really possible as at now.  expand_divmod does
> something like:
>
>   max_cost = (unsignedp
>           ? udiv_cost (speed, compute_mode)
>           : sdiv_cost (speed, compute_mode));
>
> which is reading the pre-calculated costs from a table.  Thus we don't
> really know the denominator and cannot estimate the cost based on it :(.

Ah, too bad.  OTOH for the actual case it decomposes it could compute
the real cost, avoiding the table which is filled with reg-reg operations only.

> CSE really invokes the cost hook with the actual (mod (a, (const_int
> 1000000007)) RTX but it's less important.
>
> --
> Xi Ruoyao <xry...@xry111.site>
> School of Aerospace Science and Technology, Xidian University

Reply via email to