SteNicholas commented on PR #1245: URL: https://github.com/apache/datafusion-comet/pull/1245#issuecomment-2581666737
@andygrove, you could refers to https://github.com/apache/shardingsphere/pull/13275. BTW, the JMH testing is as follows: ``` import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Level; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Scope; import org.openjdk.jmh.annotations.Setup; import org.openjdk.jmh.annotations.State; import org.openjdk.jmh.annotations.Threads; import org.openjdk.jmh.annotations.Warmup; import java.util.Map; import java.util.concurrent.ConcurrentHashMap; @Fork(3) @Warmup(iterations = 3, time = 5) @Measurement(iterations = 3, time = 5) @Threads(16) @State(Scope.Benchmark) public class ConcurrentHashMapBenchmark { private static final String KEY = "key"; private static final Object VALUE = new Object(); private final Map<String, Object> concurrentMap = new ConcurrentHashMap<>(1, 1); @Setup(Level.Iteration) public void setup() { concurrentMap.clear(); } @Benchmark public Object benchGetBeforeComputeIfAbsent() { Object result = concurrentMap.get(KEY); if (null == result) { result = concurrentMap.computeIfAbsent(KEY, __ -> VALUE); } return result; } @Benchmark public Object benchComputeIfAbsent() { return concurrentMap.computeIfAbsent(KEY, __ -> VALUE); } } ``` - JDK-8: The performance of the two methods is many orders of magnitude higher. The performance of directly calling computeIfAbsent is one million per second. The performance of calling get first to check is one billion per second, and this is equivalent to a 16-thread test. In terms of resources, the CPU utilization during the benchComputeIfAbsent test has been maintained at around 20%; while the CPU utilization during the benchGetBeforeComputeIfAbsent test has been maintained at around 100%. ``` # JMH version: 1.33 # VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11 # VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java # VM options: -Dvisualvm.id=172855224679674 # Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect) # Warmup: 3 iterations, 5 s each # Measurement: 3 iterations, 5 s each # Timeout: 10 min per iteration # Threads: 16 threads, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark:ConcurrentHashMapBenchmark.benchComputeIfAbsent # Run progress: 0.00% complete, ETA 00:03:00 # Fork: 1 of 3 # Warmup Iteration 1: 11173878.242 ops/s # Warmup Iteration 2: 8471364.065 ops/s # Warmup Iteration 3: 8766401.960 ops/s Iteration 1: 8776260.796 ops/s Iteration 2: 8632907.974 ops/s Iteration 3: 8557264.788 ops/s # Run progress: 16.67% complete, ETA 00:02:33 # Fork: 2 of 3 # Warmup Iteration 1: 7757506.431 ops/s # Warmup Iteration 2: 8176991.807 ops/s # Warmup Iteration 3: 8795107.589 ops/s Iteration 1: 8668883.337 ops/s Iteration 2: 8866318.073 ops/s Iteration 3: 8848517.540 ops/s # Run progress: 33.33% complete, ETA 00:02:02 # Fork: 3 of 3 # Warmup Iteration 1: 8154698.571 ops/s # Warmup Iteration 2: 8317945.491 ops/s # Warmup Iteration 3: 8884286.732 ops/s Iteration 1: 8912555.062 ops/s Iteration 2: 8894750.001 ops/s Iteration 3: 8780504.227 ops/s Result "ConcurrentHashMapBenchmark.benchComputeIfAbsent": 8770884.644 ±(99.9%) 210678.797 ops/s [Average] (min, avg, max) = (8557264.788, 8770884.644, 8912555.062), stdev = 125371.573 CI (99.9%): [8560205.847, 8981563.442] (assumes normal distribution) # JMH version: 1.33 # VM version: JDK 1.8.0_311, Java HotSpot(TM) 64-Bit Server VM, 25.311-b11 # VM invoker: /usr/local/java/jdk1.8.0_311/jre/bin/java # VM options: -Dvisualvm.id=172855224679674 # Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect) # Warmup: 3 iterations, 5 s each # Measurement: 3 iterations, 5 s each # Timeout: 10 min per iteration # Threads: 16 threads, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent # Run progress: 50.00% complete, ETA 00:01:31 # Fork: 1 of 3 # Warmup Iteration 1: 1881091972.510 ops/s # Warmup Iteration 2: 1843432746.197 ops/s # Warmup Iteration 3: 2353506882.860 ops/s Iteration 1: 2389458285.091 ops/s Iteration 2: 2391001171.657 ops/s Iteration 3: 2387181602.010 ops/s # Run progress: 66.67% complete, ETA 00:01:01 # Fork: 2 of 3 # Warmup Iteration 1: 1872514017.315 ops/s # Warmup Iteration 2: 1855584197.510 ops/s # Warmup Iteration 3: 2342392977.207 ops/s Iteration 1: 2378551289.692 ops/s Iteration 2: 2374081014.168 ops/s Iteration 3: 2389909613.865 ops/s # Run progress: 83.33% complete, ETA 00:00:30 # Fork: 3 of 3 # Warmup Iteration 1: 1880210774.729 ops/s # Warmup Iteration 2: 1804266170.900 ops/s # Warmup Iteration 3: 2337740394.373 ops/s Iteration 1: 2363741084.192 ops/s Iteration 2: 2372565304.724 ops/s Iteration 3: 2388015878.515 ops/s Result "ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent": 2381611693.768 ±(99.9%) 16356182.057 ops/s [Average] (min, avg, max) = (2363741084.192, 2381611693.768, 2391001171.657), stdev = 9733301.586 CI (99.9%): [2365255511.711, 2397967875.825] (assumes normal distribution) # Run complete. Total time: 00:03:03 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark Mode Cnt Score Error Units ConcurrentHashMapBenchmark.benchComputeIfAbsent thrpt 9 8770884.644 ± 210678.797 ops/s ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent thrpt 9 2381611693.768 ± 16356182.057 ops/s ``` - JDK-17: The performance of computeIfAbsent is slightly lower than get first, but the performance is at least the same order of magnitude. Moreover, the CPU is fully loaded during the running of both use cases. ``` # JMH version: 1.33 # VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39 # VM invoker: /usr/local/java/jdk-17.0.1/bin/java # VM options: -Dvisualvm.id=173221627574053 # Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect) # Warmup: 3 iterations, 5 s each # Measurement: 3 iterations, 5 s each # Timeout: 10 min per iteration # Threads: 16 threads, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: ConcurrentHashMapBenchmark.benchComputeIfAbsent # Run progress: 0.00% complete, ETA 00:03:00 # Fork: 1 of 3 # Warmup Iteration 1: 1544327446.565 ops/s # Warmup Iteration 2: 1475077923.449 ops/s # Warmup Iteration 3: 1565544222.606 ops/s Iteration 1: 1564346089.698 ops/s Iteration 2: 1560062375.891 ops/s Iteration 3: 1552569020.412 ops/s # Run progress: 16.67% complete, ETA 00:02:33 # Fork: 2 of 3 # Warmup Iteration 1: 1617143507.004 ops/s # Warmup Iteration 2: 1433136907.916 ops/s # Warmup Iteration 3: 1527623176.866 ops/s Iteration 1: 1522331660.180 ops/s Iteration 2: 1524798683.186 ops/s Iteration 3: 1522686827.744 ops/s # Run progress: 33.33% complete, ETA 00:02:02 # Fork: 3 of 3 # Warmup Iteration 1: 1671732222.173 ops/s # Warmup Iteration 2: 1462966231.429 ops/s # Warmup Iteration 3: 1553792663.545 ops/s Iteration 1: 1549840468.944 ops/s Iteration 2: 1549245571.349 ops/s Iteration 3: 1554801575.735 ops/s Result "ConcurrentHashMapBenchmark.benchComputeIfAbsent": 1544520252.571 ±(99.9%) 27953594.118 ops/s [Average] (min, avg, max) = (1522331660.180, 1544520252.571, 1564346089.698), stdev = 16634735.479 CI (99.9%): [1516566658.453, 1572473846.689] (assumes normal distribution) # JMH version: 1.33 # VM version: JDK 17.0.1, Java HotSpot(TM) 64-Bit Server VM, 17.0.1+12-LTS-39 # VM invoker: /usr/local/java/jdk-17.0.1/bin/java # VM options: -Dvisualvm.id=173221627574053 # Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect) # Warmup: 3 iterations, 5 s each # Measurement: 3 iterations, 5 s each # Timeout: 10 min per iteration的 # Threads: 16 threads, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent # Run progress: 50.00% complete, ETA 00:01:31 # Fork: 1 of 3 # Warmup Iteration 1: 1813078468.960 ops/s # Warmup Iteration 2: 1944438216.902 ops/s # Warmup Iteration 3: 2232703681.960 ops/s Iteration 1: 2233727123.664 ops/s Iteration 2: 2233657163.983 ops/s Iteration 3: 2229008772.953 ops/s # Run progress: 66.67% complete, ETA 00:01:01 # Fork: 2 of 3 # Warmup Iteration 1: 1767187585.805 ops/s # Warmup Iteration 2: 1900420998.518 ops/s # Warmup Iteration 3: 2175122268.840 ops/s Iteration 1: 2180409680.029 ops/s Iteration 2: 2181398523.091 ops/s Iteration 3: 2176454597.329 ops/s # Run progress: 83.33% complete, ETA 00:00:30 # Fork: 3 of 3 # Warmup Iteration 1: 1822355551.990 ops/s # Warmup Iteration 2: 1832618832.110 ops/s # Warmup Iteration 3: 2225265888.631 ops/s Iteration 1: 2240765668.888 ops/s Iteration 2: 2225847700.599 ops/s Iteration 3: 2232257415.965 ops/s Result "ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent": 2214836294.056 ±(99.9%) 45190341.578 ops/s [Average] (min, avg, max) = (2176454597.329, 2214836294.056, 2240765668.888), stdev = 26892047.412 CI (99.9%): [2169645952.478, 2260026635.633] (assumes normal distribution) # Run complete. Total time: 00:03:03 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark Mode Cnt Score Error Units ConcurrentHashMapBenchmark.benchComputeIfAbsent thrpt 9 1544520252.571 ± 27953594.118 ops/s ConcurrentHashMapBenchmark.benchGetBeforeComputeIfAbsent thrpt 9 2214836294.056 ± 45190341.578 ops/s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org