Hi Brian,

setting up µ-benchmarks to get meaningful results on code that has execution paths with quite different shapes, like the current and the patched one, requires some care.

I chose to exercise all paths at each iteration rather than to rely on a high iteration count. Without real world data at hand, I assume the arguments are uniformly distributed.

(Exercising individual paths by choosing appropriate data tends to show optimistic results because the JIT can use profiling information to get rid of the unused code in the other paths and thus optimize better. This might or might not be realistic, depending on use cases.)


Here's how I did the measurements.

    @Benchmark
    public long testUdiv() {
        return Long.divideUnsigned(n0, d0) | Long.divideUnsigned(n1, d1)
         | Long.divideUnsigned(n2, d2) | Long.divideUnsigned(n3, d3);
    }

The different paths can all be exercised in the same benchmark. I hope that combining the longs with | doesn't taint the results significantly.

In case of the current code, the arguments are setup at each iteration as random long such that
n0 any,  d0 < 0
n1 any,  d1 < 0
n2 > 0,  d2 > 0
n3 <= 0, d3 > 0
The path with negative divisor is executed two times more than the other paths to account for a 50% + 25% + 25% probability distribution to hit the paths.

For the patched code, the setup is
n0 any,  d0 > 0
n1 any,  d1 > 0
n2 any,  d2 < 0
n3 any , d3 < 0
so that the two paths are executed with equal probability.

The results on the current code:
MyBenchmark.testUdiv  thrpt   25  6617777.520 ± 238817.781  ops/s
and on the patched code:
MyBenchmark.testUdiv  thrpt   25  33763422.461 ± 446012.270  ops/s
showing a speedup of about 5x, provided the rationale is sound.

After these, I didn't care to measure remainderUnsigned(). It could be done similarly.


Greetings
Raffaello


On 2020-09-02 23:26, Raffaello Giulietti wrote:
I'll also add JMH results.

Reply via email to