Re: RFR: 8238669: Long.divideUnsigned is extremely slow for certain values (Needs to be Intrinsic)

Raffaello Giulietti Thu, 03 Sep 2020 05:56:40 -0700

Hi Brian,

setting up µ-benchmarks to get meaningful results on code that hasexecution paths with quite different shapes, like the current and thepatched one, requires some care.

I chose to exercise all paths at each iteration rather than to rely on ahigh iteration count. Without real world data at hand, I assume thearguments are uniformly distributed.

(Exercising individual paths by choosing appropriate data tends to showoptimistic results because the JIT can use profiling information to getrid of the unused code in the other paths and thus optimize better. Thismight or might not be realistic, depending on use cases.)



Here's how I did the measurements.

    @Benchmark
    public long testUdiv() {
        return Long.divideUnsigned(n0, d0) | Long.divideUnsigned(n1, d1)
         | Long.divideUnsigned(n2, d2) | Long.divideUnsigned(n3, d3);
    }

The different paths can all be exercised in the same benchmark. I hopethat combining the longs with | doesn't taint the results significantly.

In case of the current code, the arguments are setup at each iterationas random long such that

n0 any,  d0 < 0
n1 any,  d1 < 0
n2 > 0,  d2 > 0
n3 <= 0, d3 > 0

The path with negative divisor is executed two times more than the otherpaths to account for a 50% + 25% + 25% probability distribution to hitthe paths.


For the patched code, the setup is
n0 any,  d0 > 0
n1 any,  d1 > 0
n2 any,  d2 < 0
n3 any , d3 < 0
so that the two paths are executed with equal probability.

The results on the current code:
MyBenchmark.testUdiv  thrpt   25  6617777.520 ± 238817.781  ops/s
and on the patched code:
MyBenchmark.testUdiv  thrpt   25  33763422.461 ± 446012.270  ops/s
showing a speedup of about 5x, provided the rationale is sound.

After these, I didn't care to measure remainderUnsigned(). It could bedone similarly.



Greetings
Raffaello


On 2020-09-02 23:26, Raffaello Giulietti wrote:

I'll also add JMH results.

Re: RFR: 8238669: Long.divideUnsigned is extremely slow for certain values (Needs to be Intrinsic)

Reply via email to