cyb70289 commented on pull request #9635:
URL: https://github.com/apache/arrow/pull/9635#issuecomment-796477621


   Sum kernel performance against master branch. On skylake, clang-9.
   - Big drop for floating points as expected.
   - Consistent improvement for integers with less than 1% nulls.
   - Big drop for integers with many nulls.
   
   ```
   
----------------------------------------------------------------------------------
   Non-regressions: (18)
   
----------------------------------------------------------------------------------
                       benchmark         baseline        contender  change % 
counters
   
   // big improve for 100% nulls, not very useful
        SumKernelFloat/1048576/1  114.281 GiB/sec  853.661 GiB/sec   646.987    
   {}
       SumKernelDouble/1048576/1  203.019 GiB/sec  866.872 GiB/sec   326.991    
   {}
        SumKernelInt16/1048576/1   32.426 GiB/sec  122.892 GiB/sec   278.996    
   {}
        SumKernelInt32/1048576/1   56.421 GiB/sec  212.625 GiB/sec   276.853    
   {}
         SumKernelInt8/1048576/1   27.944 GiB/sec   66.970 GiB/sec   139.653    
   {}
        SumKernelInt64/1048576/1  148.757 GiB/sec  351.812 GiB/sec   136.500    
   {}
   
   // big improve for int64 with 0%, 0.01%, 1% nulls, nice
        SumKernelInt64/1048576/0   17.103 GiB/sec   41.992 GiB/sec   145.531    
   {}
    SumKernelInt64/1048576/10000   15.393 GiB/sec   34.708 GiB/sec   125.477    
   {}
      SumKernelInt64/1048576/100   11.021 GiB/sec   17.252 GiB/sec    56.536    
   {}
   
   // moderate improve for int8/16/32 with 0%, 0.01%, 1% nulls, nice
     SumKernelInt8/1048576/10000    9.069 GiB/sec   12.978 GiB/sec    43.096    
   {}
    SumKernelInt32/1048576/10000   26.149 GiB/sec   33.499 GiB/sec    28.105    
   {}
    SumKernelInt16/1048576/10000   17.783 GiB/sec   22.388 GiB/sec    25.897    
   {}
      SumKernelInt16/1048576/100    3.447 GiB/sec    4.173 GiB/sec    21.074    
   {}
      SumKernelInt32/1048576/100    7.343 GiB/sec    8.423 GiB/sec    14.719    
   {}
         SumKernelInt8/1048576/0   16.225 GiB/sec   18.284 GiB/sec    12.694    
   {}
       SumKernelInt8/1048576/100    1.827 GiB/sec    2.036 GiB/sec    11.447    
   {}
        SumKernelInt16/1048576/0   27.901 GiB/sec   30.025 GiB/sec     7.612    
   {}
        SumKernelInt32/1048576/0   40.628 GiB/sec   43.467 GiB/sec     6.987    
   {}
   
   
----------------------------------------------------------------------------------
   Regressions: (18)
   
----------------------------------------------------------------------------------
                        benchmark        baseline        contender  change % 
counters
   
   // big drop for floating points, expected
       SumKernelFloat/1048576/100   6.074 GiB/sec    4.074 GiB/sec   -32.925    
   {}
      SumKernelDouble/1048576/100  16.927 GiB/sec   10.762 GiB/sec   -36.421    
   {}
    SumKernelDouble/1048576/10000  36.039 GiB/sec   20.847 GiB/sec   -42.154    
   {}
        SumKernelDouble/1048576/0  48.230 GiB/sec   20.918 GiB/sec   -56.629    
   {}
        SumKernelFloat/1048576/10   3.689 GiB/sec    1.199 GiB/sec   -67.497    
   {}
     SumKernelFloat/1048576/10000  24.296 GiB/sec    6.896 GiB/sec   -71.618    
   {}
       SumKernelDouble/1048576/10   9.678 GiB/sec    2.569 GiB/sec   -73.459    
   {}
         SumKernelFloat/1048576/0  35.913 GiB/sec    7.111 GiB/sec   -80.198    
   {}
   
   // moderate drop for int64 with 10%, 50% nulls
        SumKernelInt64/1048576/10   3.643 GiB/sec    3.375 GiB/sec    -7.339    
   {}
         SumKernelInt64/1048576/2   2.418 GiB/sec    2.129 GiB/sec   -11.952    
   {}
   
   // huge drop for int8/16/32 with 10%, 50% nulls, may be improved
        SumKernelInt16/1048576/10   1.677 GiB/sec  924.529 MiB/sec   -46.153    
   {}
        SumKernelInt32/1048576/10   3.143 GiB/sec    1.691 GiB/sec   -46.203    
   {}
         SumKernelInt8/1048576/10   1.192 GiB/sec  441.498 MiB/sec   -63.831    
   {}
         SumKernelInt32/1048576/2   4.269 GiB/sec    1.126 GiB/sec   -73.619    
   {}
         SumKernelInt16/1048576/2   3.281 GiB/sec  621.903 MiB/sec   -81.490    
   {}
          SumKernelInt8/1048576/2   2.412 GiB/sec  303.954 MiB/sec   -87.691    
   {}
         SumKernelFloat/1048576/2   4.729 GiB/sec  729.679 MiB/sec   -84.932    
   {}
        SumKernelDouble/1048576/2  11.793 GiB/sec    1.373 GiB/sec   -88.362    
   {}
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to