ZhangHuiGui commented on PR #41036:
URL: https://github.com/apache/arrow/pull/41036#issuecomment-2049955777

   > ```shell
   > 024/2                                                279436 us
   > ```
   
   Ah, nice catch. 
   The performance problem here should be due to two reasons:
   1. A random null ratio of more than 50% will increase the cost of 
comparison. CompareColumnsToRows requires more branches to participate.
   
https://github.com/apache/arrow/blob/62693170aee3bea2dfec272e51bf3bc4d1297a53/cpp/src/arrow/compute/row/compare_internal.cc#L382-L391
   3. The reason why the performance of int32+int64 is much worse than that of 
int32+int32 is because the different col_width of each row needs to enter 
different branches during the comparison process, which will destroy the CPU 
pipeline.
   
   
https://github.com/apache/arrow/blob/62693170aee3bea2dfec272e51bf3bc4d1297a53/cpp/src/arrow/compute/row/compare_internal.cc#L176-L199


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to