mccullocht commented on PR #15742: URL: https://github.com/apache/lucene/pull/15742#issuecomment-3937456120
It looks like there are some losses on x86 -- binaryHalfByteDotProductVector sees a small loss, binaryHalfByteSquareVector sees a fairly large loss. I'll investigate a bit. AMD Ryzen AI 395 (AVX 512) Baseline: ``` VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector 1024 thrpt 15 23.318 ± 0.107 ops/us VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector 1024 thrpt 15 11.839 ± 0.075 ops/us VectorUtilBenchmark.binaryHalfByteDotProductVector 1024 thrpt 15 66.883 ± 0.965 ops/us VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector 1024 thrpt 15 29.886 ± 0.167 ops/us VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector 1024 thrpt 15 12.464 ± 0.374 ops/us VectorUtilBenchmark.binaryHalfByteSquareVector 1024 thrpt 15 71.097 ± 0.476 ops/us ``` Experiment: ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector 1024 thrpt 15 35.444 ± 0.184 ops/us VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector 1024 thrpt 15 42.581 ± 0.464 ops/us VectorUtilBenchmark.binaryHalfByteDotProductVector 1024 thrpt 15 62.781 ± 0.686 ops/us VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector 1024 thrpt 15 33.665 ± 0.254 ops/us VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector 1024 thrpt 15 41.314 ± 0.367 ops/us VectorUtilBenchmark.binaryHalfByteSquareVector 1024 thrpt 15 58.312 ± 0.703 ops/us ``` Mac M2 (128 bit/NEON): Baseline: ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector 1024 thrpt 15 15.866 ± 0.206 ops/us VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector 1024 thrpt 15 2.746 ± 0.029 ops/us VectorUtilBenchmark.binaryHalfByteDotProductVector 1024 thrpt 15 13.612 ± 0.127 ops/us VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector 1024 thrpt 15 15.815 ± 0.068 ops/us VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector 1024 thrpt 15 2.758 ± 0.031 ops/us VectorUtilBenchmark.binaryHalfByteSquareVector 1024 thrpt 15 13.440 ± 0.088 ops/us ``` Experiment: ``` Benchmark (size) Mode Cnt Score Error Units VectorUtilBenchmark.binaryHalfByteDotProductBothPackedVector 1024 thrpt 15 23.285 ± 0.371 ops/us VectorUtilBenchmark.binaryHalfByteDotProductSinglePackedVector 1024 thrpt 15 25.559 ± 0.601 ops/us VectorUtilBenchmark.binaryHalfByteDotProductVector 1024 thrpt 15 17.269 ± 1.498 ops/us VectorUtilBenchmark.binaryHalfByteSquareBothPackedVector 1024 thrpt 15 21.115 ± 0.188 ops/us VectorUtilBenchmark.binaryHalfByteSquareSinglePackedVector 1024 thrpt 15 24.063 ± 0.477 ops/us VectorUtilBenchmark.binaryHalfByteSquareVector 1024 thrpt 15 17.184 ± 0.077 ops/us ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
