On Wed, Mar 27, 2024 at 1:20 PM Xi Ruoyao <xry...@xry111.site> wrote: > > On Wed, 2024-03-27 at 08:54 +0100, Richard Biener wrote: > > On Tue, Mar 26, 2024 at 10:52 AM Xi Ruoyao <xry...@xry111.site> wrote: > > > > > > The latency of LA464 and LA664 division instructions depends on the > > > input. When I updated the costs in r14-6642, I unintentionally set the > > > division costs to the best-case latency (when the first operand is 0). > > > Per a recent discussion [1] we should use "something sensible" instead > > > of it. > > > > > > Use the average of the minimum and maximum latency observed instead. > > > This enables multiplication to reciprocal sequence reduction and speeds > > > up the following test case for about 30%: > > > > > > int > > > main (void) > > > { > > > unsigned long stat = 0xdeadbeef; > > > for (int i = 0; i < 100000000; i++) > > > stat = (stat * stat + stat * 114514 + 1919810) % 1000000007; > > > asm(""::"r"(stat)); > > > } > > > > I think you should be able to see a constant divisor and thus could do > > better than return the same latency for everything. For non-constant > > divisors using the best-case latency shouldn't be a problem. > > Hmm, it seems not really possible as at now. expand_divmod does > something like: > > max_cost = (unsignedp > ? udiv_cost (speed, compute_mode) > : sdiv_cost (speed, compute_mode)); > > which is reading the pre-calculated costs from a table. Thus we don't > really know the denominator and cannot estimate the cost based on it :(.
Ah, too bad. OTOH for the actual case it decomposes it could compute the real cost, avoiding the table which is filled with reg-reg operations only. > CSE really invokes the cost hook with the actual (mod (a, (const_int > 1000000007)) RTX but it's less important. > > -- > Xi Ruoyao <xry...@xry111.site> > School of Aerospace Science and Technology, Xidian University