[jira] [Commented] (GEOMETRY-75) Performance Test Module

Alex Herbert (Jira) Wed, 29 Jan 2020 16:07:22 -0800


    [ 
https://issues.apache.org/jira/browse/GEOMETRY-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026329#comment-17026329
 ]


Alex Herbert commented on GEOMETRY-75:
--------------------------------------

I see that you did use a JDK from 9+. Here's the difference on my laptop 
between JDK 8 and 9:
{noformat}
Benchmark                      (type)  Mode  Cnt    Score   Error  Units
VectorPerformance.norm2D       random  avgt    5  291.902 ± 2.366  ns/op
VectorPerformance.norm2D         edge  avgt    5   29.761 ± 0.081  ns/op
VectorPerformance.normalize2D     N/A  avgt    5  319.804 ± 9.047  ns/op

Benchmark                      (type)  Mode  Cnt   Score   Error  Units
VectorPerformance.norm2D       random  avgt    5  20.071 ± 0.757  ns/op
VectorPerformance.norm2D         edge  avgt    5  15.815 ± 2.870  ns/op
VectorPerformance.normalize2D     N/A  avgt    5  19.371 ± 3.843  ns/op
{noformat}
Note the hypot function does a fair bit of work to protect over/underflow. If 
you replace it with {{Math.sqrt(x*x+y*y)}} you will see a big difference.

I note that your benchmarks use 1 vector. Best practice for benchmarking code 
with lots of branches is to have a lot of data. This applies to Math.hypot. 
With the same input the branch prediction process in the CPU will learn the 
input and choose the right branch. So the JMH result gives you the time for the 
computations on the correct path. Not the time it would take if the data were 
unpredictable (which would be slower as some cycles are lost when an incorrect 
branch is pipelined and has to be corrected).

Math.hypot uses extended precision multiplication to achieve 1 ULP results from 
the exact answer. Part of this involves a branch where given {{|x| > |y|}} it 
chooses a different calculation if {{|2y| > |x|}}, i.e. the values are within 
2-fold of each other. On most input combinations with the same range for x and 
y this will occur 50% of the time. It is hard to predict when it will happen 
from earlier branches in the same function which only check which is larger. 
This impacts the performance of hypot. I've tested this using a mock which 
returns an incorrect result just to make the branch do something. This code is 
slower than returning {{x * x + y * y}} as the unpredictable 50% guess on what 
to do impacts performance.
{code:java}
        final double w = x - y;
        if (w > y) {
            return x;
        }
        // 2y > x > y
        return y;
{code}
To try more data you could add {{@Setup(Level.Iteration)}}. This will generate 
a different vector for each iteration, so 5 in total with the current defaults. 
I did this and you can see the two branches being run at max speed in norm2D. 
Here it is not quite 50/50 (I get a more even ratio with more iterations) but 
one branch is 17.8ns and the other 14.6ns.
{noformat}
java -jar target/examples-jmh.jar VectorPerformance.*norm2D -i 10 -p type=random
Iteration   1: 17.831 ns/op
Iteration   2: 17.849 ns/op
Iteration   3: 17.804 ns/op
Iteration   4: 14.762 ns/op
Iteration   5: 17.905 ns/op
Iteration   6: 14.667 ns/op
Iteration   7: 18.038 ns/op
Iteration   8: 17.827 ns/op
Iteration   9: 17.857 ns/op
Iteration  10: 14.672 ns/op
{noformat}
For your standard random data and the tests for 1D and 3D these have no 
branches so 1 data sample is enough.

The bigger concern here is for the edge cases. You have 9 edge case numbers in 
your array and you create a max length vector of size 3 from them. So the edge 
case test is not validating all possible edge cases.

You have two options here:
 * Add a {{@Benchmark}} for each specific edge case (NaN, Inf, 0 as documented 
in the Vectors.norm methods)
 * Alter the test to create a set of vectors of a given size and the benchmark 
has to loop over them all

The second method has some overhead. However you just mock up a similar test 
with a no-op in place of your normalisation method. This encapsulates all the 
same overhead of object creation without doing any computation:
{code:java}
    @Benchmark
    public Vector3D[] baseline(final NormalizableVectorInput3D input) {
        Vector3D[] data = input.getData();
        Vector3D[] result = new Vector3D[data.length];
        for (int i = 0; i < data.length; i++) {
            Vector v = data[i];
            result[i] = Vector3D.of(v.getX(), v.getY(), v.getZ());
        }
        return result;
    }
{code}
If you subtract the time for this from your real benchmarks it provides the 
time for the operation.

I see you are using the log-uniform type distribution for producing random 
doubles. This is fine if you want to generate numbers with fully random 52-bit 
mantissas and random exponents. But these may not represent your actual 
expected data distribution. A double has a lot of values but these are not 
uniformly spread.

In the Complex benchmark I have added unnormalized vectors using a 
NormalizedGaussianSampler for each dimension. This puts your ND vector at a 
random orientation and length. I've been working on a branch but I'll add this 
to the numbers master so you can have a look. It applies perfectly to what you 
are benchmarking here.

> Performance Test Module
> -----------------------
>
>                 Key: GEOMETRY-75
>                 URL: https://issues.apache.org/jira/browse/GEOMETRY-75
>             Project: Apache Commons Geometry
>          Issue Type: Task
>            Reporter: Matt Juntunen
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Add a module for executing performance tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEOMETRY-75) Performance Test Module

Reply via email to