yordan-pavlov opened a new pull request #7204:
URL: https://github.com/apache/arrow/pull/7204


   This pull request completes the SIMD implementation of the comparison kernel 
in simd_compare_op by using a bitmask SIMD operation instead of a for loop to 
copy the results of comparison.
   
   Previously simd_compare_op was only about 10% faster compared to the 
non-SIMD implementation and was taking approximately the same time for types of 
different length (which indicates that the SIMD implementation was not 
complete). Below are results from benchmarks of the old implementation with 
Int8 and Float32 types:
   
   eq Int8                 time:   [947.53 us 947.81 us 948.05 us]
   eq Int8 simd            time:   [855.02 us 858.26 us 862.48 us]
   neq Int8                time:   [904.09 us 907.34 us 911.44 us]
   neq Int8 simd           time:   [848.49 us 849.28 us 850.28 us]
   lt Int8                 time:   [900.87 us 902.65 us 904.86 us]
   lt Int8 simd            time:   [850.32 us 850.96 us 851.90 us]
   lt_eq Int8              time:   [974.68 us 983.03 us 991.98 us]
   lt_eq Int8 simd         time:   [851.83 us 852.22 us 852.74 us]
   gt Int8                 time:   [908.48 us 911.76 us 914.72 us]
   gt Int8 simd            time:   [851.93 us 852.43 us 853.04 us]
   gt_eq Int8              time:   [981.53 us 983.37 us 986.31 us]
   gt_eq Int8 simd         time:   [855.59 us 856.83 us 858.61 us]
   
   eq Float32              time:   [911.46 us 911.70 us 912.01 us]
   eq Float32 simd         time:   [884.74 us 885.97 us 887.74 us]
   neq Float32             time:   [904.26 us 904.73 us 905.27 us]
   neq Float32 simd        time:   [884.40 us 892.32 us 901.98 us]
   lt Float32              time:   [907.90 us 908.54 us 909.34 us]
   lt Float32 simd         time:   [883.23 us 886.05 us 889.31 us]
   lt_eq Float32           time:   [911.44 us 911.62 us 911.82 us]
   lt_eq Float32 simd      time:   [882.78 us 886.78 us 891.05 us]
   gt Float32              time:   [906.88 us 907.96 us 909.32 us]
   gt Float32 simd         time:   [879.78 us 883.03 us 886.63 us]
   gt_eq Float32           time:   [924.72 us 926.03 us 928.29 us]
   gt_eq Float32 simd      time:   [884.80 us 885.93 us 887.35 us]
   
   In the benchmark results above, notice how both the SIMD and non-SIMD 
operations take similar amount of time for types of different size (Int8 and 
Float32). This is normal for a non-SIMD implementation but is not normal for a 
SIMD implementation as SIMD operations can be executed on more values of 
smaller size.
   
   After the change proposed in this pull request, performance is about 10 
times better for Float32 and about 40 times better for Int8. The results below 
indicate the SIMD implementation is now complete  as operations take 
approximately 4 times less time for a 4 times smaller type. Here are the 
benchmark results:
   
   eq Int8                 time:   [949.64 us 951.38 us 953.43 us]
   eq Int8 simd            time:   [20.569 us 20.576 us 20.583 us]
   neq Int8                time:   [903.97 us 908.96 us 914.34 us]
   neq Int8 simd           time:   [20.675 us 20.741 us 20.840 us]
   lt Int8                 time:   [894.93 us 895.99 us 897.55 us]
   lt Int8 simd            time:   [21.564 us 22.105 us 22.802 us]
   lt_eq Int8              time:   [950.65 us 952.05 us 954.11 us]
   lt_eq Int8 simd         time:   [21.072 us 21.220 us 21.399 us]
   gt Int8                 time:   [894.76 us 895.47 us 896.61 us]
   gt Int8 simd            time:   [21.680 us 22.546 us 23.465 us]
   gt_eq Int8              time:   [948.65 us 949.53 us 950.71 us]
   gt_eq Int8 simd         time:   [21.245 us 21.329 us 21.473 us]
   
   eq Float32              time:   [920.17 us 923.43 us 927.43 us]
   eq Float32 simd         time:   [79.024 us 79.179 us 79.372 us]
   neq Float32             time:   [910.46 us 912.84 us 915.68 us]
   neq Float32 simd        time:   [78.683 us 78.992 us 79.452 us]
   lt Float32              time:   [905.08 us 907.03 us 909.38 us]
   lt Float32 simd         time:   [79.296 us 79.545 us 79.873 us]
   lt_eq Float32           time:   [920.25 us 923.87 us 928.94 us]
   lt_eq Float32 simd      time:   [80.974 us 81.584 us 82.365 us]
   gt Float32              time:   [911.43 us 916.43 us 923.39 us]
   gt Float32 simd         time:   [80.079 us 80.336 us 80.692 us]
   gt_eq Float32           time:   [923.22 us 923.76 us 924.37 us]
   gt_eq Float32 simd      time:   [81.175 us 81.402 us 81.683 us]
   
   @paddyhoran let me know what you think


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to