JulianJuelg opened a new pull request, #2428: URL: https://github.com/apache/systemds/pull/2428
This PR adds a Java Vector API implementation for dense codegen primitives in the following groups: - Aggregation - Division - Comparison - Multiply-add (remaining) The new vectorized implementations were benchmarked against the previous scalar-loop versions (see results below) with JMH microbenchmarks and a standalone Java benchmark suite included in this PR. In most cases, both harnesses show the same trend. In caseswhere they differ slightly, JMH is used as the primary signal due to lower volatility. For each primitive, I compared the Vector API version to the existing scalar loop: - If performance was equal, or better, I replaced the scalar loop with the vectorized implementation. - If the Vector API version was slower, I kept the scalar implementation as the default and left the vectorized version in the codebase for reference Benchmark setup JDK version : 21 JMH version: 1.37 OS: macOS Machine: (Apple M2/M, 16 GB RAM, 128-bit vector width/ SIMD) Input size (double arrays): 1,000,000 elements Warmup time: 1s per primitive Measurement: 1 Iteration JMH params: 2 Forks Note: These benchmarks were run with a 128-bit SIMD vector width, which is only 2 lanes for doubles. On production deployments with wider SIMD (e.g., 256-bit or 512-bit where available), the vectorized implementations are expected to provide equal or better speedups due to increased lane-level parallelism. | Primitive Function | ns/op (JMH) | JMH Test: Speedup with Vector API | Java Test: Speedup with Vector API | Replaced | |---|---:|---:|---:|---| | vectDivAdd | 231671 | 1.066 | 1.887 |Yes | | vectDivAdd2 | 218818 | 1.066 | 1.686 | Yes| | vectDivWrite | 359339 | 0.687 | 1.489 | No | | vectDivWrite2 | 343183 | 0.7215 | 0.717 | No | | vectDivWrite3 | 535898 | 0.7821 | 0.603 | No | | rowMaxsVectMult | 298328 | 1.006 | 1.346 | Yes | | rowMaxsVectMult_aix | 738767 | 0.115 | 0.077 | No | | vectSum | 142065 | 0.322 | 0.565 | No | | vectMax | 596046 | 2.002 | 1.933 |Yes | | vectCountnnz | 297805 | 1.594 | 1.538 | Yes| | vectEqualAdd | 427437 | 1.959 | 2.077 | Yes | | vectEqualWrite2 | 414717 | 1.183 | 0.801 | Yes | | vectEqualWrite | 415329 | 1.189 | 1.402 | Yes | | vectGreaterAdd | 427981 | 1.936 | 2.114 | Yes | | vectGreaterWrite2 | 552023 | 0.588 | 0.919 | No | | vectGreaterWrite | 458332 | 1.309 | 0.927 | Yes | | vectLessAdd | 531844 | 2.433 | 2.052 | Yes | | vectLessWrite2 | 545457 | 1.011 | 0.951 | Yes | | vectLessWrite | 414025 | 1.203 | 1.039 | Yes | | vectLessequalAdd | 426307 | 1.960 | 2.052 | Yes | | vectLessequalWrite2 | 540476 | 1.014 | 0.962 | Yes | | vectLessequalWrite | 414514 | 1.181 | 0.953 | Yes | | vectMin | 589668 | 2.000 | 1.996 | Yes | | vectMult2Add | 228636 | 1.052 | 1.284 | Yes | | vectMult2Write | 377074 | 2.136 | 1.375 |Yes | | vectNotequalAdd | 424749 | 1.945 | 1.643 | Yes | | vectNotequalWrite2 | 566433 | 0.714 | 0.821 | No | | vectNotequalWrite | 417206 | 1.203 | 0.941 | Yes | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
