[
https://issues.apache.org/jira/browse/GEOMETRY-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026329#comment-17026329
]
Alex Herbert commented on GEOMETRY-75:
--------------------------------------
I see that you did use a JDK from 9+. Here's the difference on my laptop
between JDK 8 and 9:
{noformat}
Benchmark (type) Mode Cnt Score Error Units
VectorPerformance.norm2D random avgt 5 291.902 ± 2.366 ns/op
VectorPerformance.norm2D edge avgt 5 29.761 ± 0.081 ns/op
VectorPerformance.normalize2D N/A avgt 5 319.804 ± 9.047 ns/op
Benchmark (type) Mode Cnt Score Error Units
VectorPerformance.norm2D random avgt 5 20.071 ± 0.757 ns/op
VectorPerformance.norm2D edge avgt 5 15.815 ± 2.870 ns/op
VectorPerformance.normalize2D N/A avgt 5 19.371 ± 3.843 ns/op
{noformat}
Note the hypot function does a fair bit of work to protect over/underflow. If
you replace it with {{Math.sqrt(x*x+y*y)}} you will see a big difference.
I note that your benchmarks use 1 vector. Best practice for benchmarking code
with lots of branches is to have a lot of data. This applies to Math.hypot.
With the same input the branch prediction process in the CPU will learn the
input and choose the right branch. So the JMH result gives you the time for the
computations on the correct path. Not the time it would take if the data were
unpredictable (which would be slower as some cycles are lost when an incorrect
branch is pipelined and has to be corrected).
Math.hypot uses extended precision multiplication to achieve 1 ULP results from
the exact answer. Part of this involves a branch where given {{|x| > |y|}} it
chooses a different calculation if {{|2y| > |x|}}, i.e. the values are within
2-fold of each other. On most input combinations with the same range for x and
y this will occur 50% of the time. It is hard to predict when it will happen
from earlier branches in the same function which only check which is larger.
This impacts the performance of hypot. I've tested this using a mock which
returns an incorrect result just to make the branch do something. This code is
slower than returning {{x * x + y * y}} as the unpredictable 50% guess on what
to do impacts performance.
{code:java}
final double w = x - y;
if (w > y) {
return x;
}
// 2y > x > y
return y;
{code}
To try more data you could add {{@Setup(Level.Iteration)}}. This will generate
a different vector for each iteration, so 5 in total with the current defaults.
I did this and you can see the two branches being run at max speed in norm2D.
Here it is not quite 50/50 (I get a more even ratio with more iterations) but
one branch is 17.8ns and the other 14.6ns.
{noformat}
java -jar target/examples-jmh.jar VectorPerformance.*norm2D -i 10 -p type=random
Iteration 1: 17.831 ns/op
Iteration 2: 17.849 ns/op
Iteration 3: 17.804 ns/op
Iteration 4: 14.762 ns/op
Iteration 5: 17.905 ns/op
Iteration 6: 14.667 ns/op
Iteration 7: 18.038 ns/op
Iteration 8: 17.827 ns/op
Iteration 9: 17.857 ns/op
Iteration 10: 14.672 ns/op
{noformat}
For your standard random data and the tests for 1D and 3D these have no
branches so 1 data sample is enough.
The bigger concern here is for the edge cases. You have 9 edge case numbers in
your array and you create a max length vector of size 3 from them. So the edge
case test is not validating all possible edge cases.
You have two options here:
* Add a {{@Benchmark}} for each specific edge case (NaN, Inf, 0 as documented
in the Vectors.norm methods)
* Alter the test to create a set of vectors of a given size and the benchmark
has to loop over them all
The second method has some overhead. However you just mock up a similar test
with a no-op in place of your normalisation method. This encapsulates all the
same overhead of object creation without doing any computation:
{code:java}
@Benchmark
public Vector3D[] baseline(final NormalizableVectorInput3D input) {
Vector3D[] data = input.getData();
Vector3D[] result = new Vector3D[data.length];
for (int i = 0; i < data.length; i++) {
Vector v = data[i];
result[i] = Vector3D.of(v.getX(), v.getY(), v.getZ());
}
return result;
}
{code}
If you subtract the time for this from your real benchmarks it provides the
time for the operation.
I see you are using the log-uniform type distribution for producing random
doubles. This is fine if you want to generate numbers with fully random 52-bit
mantissas and random exponents. But these may not represent your actual
expected data distribution. A double has a lot of values but these are not
uniformly spread.
In the Complex benchmark I have added unnormalized vectors using a
NormalizedGaussianSampler for each dimension. This puts your ND vector at a
random orientation and length. I've been working on a branch but I'll add this
to the numbers master so you can have a look. It applies perfectly to what you
are benchmarking here.
> Performance Test Module
> -----------------------
>
> Key: GEOMETRY-75
> URL: https://issues.apache.org/jira/browse/GEOMETRY-75
> Project: Apache Commons Geometry
> Issue Type: Task
> Reporter: Matt Juntunen
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Add a module for executing performance tests.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)