cyb70289 commented on pull request #9635: URL: https://github.com/apache/arrow/pull/9635#issuecomment-796477621
Sum kernel performance against master branch. On skylake, clang-9. - Big drop for floating points as expected. - Consistent improvement for integers with less than 1% nulls. - Big drop for integers with many nulls. ``` ---------------------------------------------------------------------------------- Non-regressions: (18) ---------------------------------------------------------------------------------- benchmark baseline contender change % counters // big improve for 100% nulls, not very useful SumKernelFloat/1048576/1 114.281 GiB/sec 853.661 GiB/sec 646.987 {} SumKernelDouble/1048576/1 203.019 GiB/sec 866.872 GiB/sec 326.991 {} SumKernelInt16/1048576/1 32.426 GiB/sec 122.892 GiB/sec 278.996 {} SumKernelInt32/1048576/1 56.421 GiB/sec 212.625 GiB/sec 276.853 {} SumKernelInt8/1048576/1 27.944 GiB/sec 66.970 GiB/sec 139.653 {} SumKernelInt64/1048576/1 148.757 GiB/sec 351.812 GiB/sec 136.500 {} // big improve for int64 with 0%, 0.01%, 1% nulls, nice SumKernelInt64/1048576/0 17.103 GiB/sec 41.992 GiB/sec 145.531 {} SumKernelInt64/1048576/10000 15.393 GiB/sec 34.708 GiB/sec 125.477 {} SumKernelInt64/1048576/100 11.021 GiB/sec 17.252 GiB/sec 56.536 {} // moderate improve for int8/16/32 with 0%, 0.01%, 1% nulls, nice SumKernelInt8/1048576/10000 9.069 GiB/sec 12.978 GiB/sec 43.096 {} SumKernelInt32/1048576/10000 26.149 GiB/sec 33.499 GiB/sec 28.105 {} SumKernelInt16/1048576/10000 17.783 GiB/sec 22.388 GiB/sec 25.897 {} SumKernelInt16/1048576/100 3.447 GiB/sec 4.173 GiB/sec 21.074 {} SumKernelInt32/1048576/100 7.343 GiB/sec 8.423 GiB/sec 14.719 {} SumKernelInt8/1048576/0 16.225 GiB/sec 18.284 GiB/sec 12.694 {} SumKernelInt8/1048576/100 1.827 GiB/sec 2.036 GiB/sec 11.447 {} SumKernelInt16/1048576/0 27.901 GiB/sec 30.025 GiB/sec 7.612 {} SumKernelInt32/1048576/0 40.628 GiB/sec 43.467 GiB/sec 6.987 {} ---------------------------------------------------------------------------------- Regressions: (18) ---------------------------------------------------------------------------------- benchmark baseline contender change % counters // big drop for floating points, expected SumKernelFloat/1048576/100 6.074 GiB/sec 4.074 GiB/sec -32.925 {} SumKernelDouble/1048576/100 16.927 GiB/sec 10.762 GiB/sec -36.421 {} SumKernelDouble/1048576/10000 36.039 GiB/sec 20.847 GiB/sec -42.154 {} SumKernelDouble/1048576/0 48.230 GiB/sec 20.918 GiB/sec -56.629 {} SumKernelFloat/1048576/10 3.689 GiB/sec 1.199 GiB/sec -67.497 {} SumKernelFloat/1048576/10000 24.296 GiB/sec 6.896 GiB/sec -71.618 {} SumKernelDouble/1048576/10 9.678 GiB/sec 2.569 GiB/sec -73.459 {} SumKernelFloat/1048576/0 35.913 GiB/sec 7.111 GiB/sec -80.198 {} // moderate drop for int64 with 10%, 50% nulls SumKernelInt64/1048576/10 3.643 GiB/sec 3.375 GiB/sec -7.339 {} SumKernelInt64/1048576/2 2.418 GiB/sec 2.129 GiB/sec -11.952 {} // huge drop for int8/16/32 with 10%, 50% nulls, may be improved SumKernelInt16/1048576/10 1.677 GiB/sec 924.529 MiB/sec -46.153 {} SumKernelInt32/1048576/10 3.143 GiB/sec 1.691 GiB/sec -46.203 {} SumKernelInt8/1048576/10 1.192 GiB/sec 441.498 MiB/sec -63.831 {} SumKernelInt32/1048576/2 4.269 GiB/sec 1.126 GiB/sec -73.619 {} SumKernelInt16/1048576/2 3.281 GiB/sec 621.903 MiB/sec -81.490 {} SumKernelInt8/1048576/2 2.412 GiB/sec 303.954 MiB/sec -87.691 {} SumKernelFloat/1048576/2 4.729 GiB/sec 729.679 MiB/sec -84.932 {} SumKernelDouble/1048576/2 11.793 GiB/sec 1.373 GiB/sec -88.362 {} ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org