[ https://issues.apache.org/jira/browse/GEOMETRY-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026329#comment-17026329 ]
Alex Herbert commented on GEOMETRY-75: -------------------------------------- I see that you did use a JDK from 9+. Here's the difference on my laptop between JDK 8 and 9: {noformat} Benchmark (type) Mode Cnt Score Error Units VectorPerformance.norm2D random avgt 5 291.902 ± 2.366 ns/op VectorPerformance.norm2D edge avgt 5 29.761 ± 0.081 ns/op VectorPerformance.normalize2D N/A avgt 5 319.804 ± 9.047 ns/op Benchmark (type) Mode Cnt Score Error Units VectorPerformance.norm2D random avgt 5 20.071 ± 0.757 ns/op VectorPerformance.norm2D edge avgt 5 15.815 ± 2.870 ns/op VectorPerformance.normalize2D N/A avgt 5 19.371 ± 3.843 ns/op {noformat} Note the hypot function does a fair bit of work to protect over/underflow. If you replace it with {{Math.sqrt(x*x+y*y)}} you will see a big difference. I note that your benchmarks use 1 vector. Best practice for benchmarking code with lots of branches is to have a lot of data. This applies to Math.hypot. With the same input the branch prediction process in the CPU will learn the input and choose the right branch. So the JMH result gives you the time for the computations on the correct path. Not the time it would take if the data were unpredictable (which would be slower as some cycles are lost when an incorrect branch is pipelined and has to be corrected). Math.hypot uses extended precision multiplication to achieve 1 ULP results from the exact answer. Part of this involves a branch where given {{|x| > |y|}} it chooses a different calculation if {{|2y| > |x|}}, i.e. the values are within 2-fold of each other. On most input combinations with the same range for x and y this will occur 50% of the time. It is hard to predict when it will happen from earlier branches in the same function which only check which is larger. This impacts the performance of hypot. I've tested this using a mock which returns an incorrect result just to make the branch do something. This code is slower than returning {{x * x + y * y}} as the unpredictable 50% guess on what to do impacts performance. {code:java} final double w = x - y; if (w > y) { return x; } // 2y > x > y return y; {code} To try more data you could add {{@Setup(Level.Iteration)}}. This will generate a different vector for each iteration, so 5 in total with the current defaults. I did this and you can see the two branches being run at max speed in norm2D. Here it is not quite 50/50 (I get a more even ratio with more iterations) but one branch is 17.8ns and the other 14.6ns. {noformat} java -jar target/examples-jmh.jar VectorPerformance.*norm2D -i 10 -p type=random Iteration 1: 17.831 ns/op Iteration 2: 17.849 ns/op Iteration 3: 17.804 ns/op Iteration 4: 14.762 ns/op Iteration 5: 17.905 ns/op Iteration 6: 14.667 ns/op Iteration 7: 18.038 ns/op Iteration 8: 17.827 ns/op Iteration 9: 17.857 ns/op Iteration 10: 14.672 ns/op {noformat} For your standard random data and the tests for 1D and 3D these have no branches so 1 data sample is enough. The bigger concern here is for the edge cases. You have 9 edge case numbers in your array and you create a max length vector of size 3 from them. So the edge case test is not validating all possible edge cases. You have two options here: * Add a {{@Benchmark}} for each specific edge case (NaN, Inf, 0 as documented in the Vectors.norm methods) * Alter the test to create a set of vectors of a given size and the benchmark has to loop over them all The second method has some overhead. However you just mock up a similar test with a no-op in place of your normalisation method. This encapsulates all the same overhead of object creation without doing any computation: {code:java} @Benchmark public Vector3D[] baseline(final NormalizableVectorInput3D input) { Vector3D[] data = input.getData(); Vector3D[] result = new Vector3D[data.length]; for (int i = 0; i < data.length; i++) { Vector v = data[i]; result[i] = Vector3D.of(v.getX(), v.getY(), v.getZ()); } return result; } {code} If you subtract the time for this from your real benchmarks it provides the time for the operation. I see you are using the log-uniform type distribution for producing random doubles. This is fine if you want to generate numbers with fully random 52-bit mantissas and random exponents. But these may not represent your actual expected data distribution. A double has a lot of values but these are not uniformly spread. In the Complex benchmark I have added unnormalized vectors using a NormalizedGaussianSampler for each dimension. This puts your ND vector at a random orientation and length. I've been working on a branch but I'll add this to the numbers master so you can have a look. It applies perfectly to what you are benchmarking here. > Performance Test Module > ----------------------- > > Key: GEOMETRY-75 > URL: https://issues.apache.org/jira/browse/GEOMETRY-75 > Project: Apache Commons Geometry > Issue Type: Task > Reporter: Matt Juntunen > Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Add a module for executing performance tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)