ZhangHuiGui commented on PR #41036: URL: https://github.com/apache/arrow/pull/41036#issuecomment-2049955777
> ```shell > 024/2 279436 us > ``` Ah, nice catch. The performance problem here should be due to two reasons: 1. A random null ratio of more than 50% will increase the cost of comparison. CompareColumnsToRows requires more branches to participate. https://github.com/apache/arrow/blob/62693170aee3bea2dfec272e51bf3bc4d1297a53/cpp/src/arrow/compute/row/compare_internal.cc#L382-L391 3. The reason why the performance of int32+int64 is much worse than that of int32+int32 is because the different col_width of each row needs to enter different branches during the comparison process, which will destroy the CPU pipeline. https://github.com/apache/arrow/blob/62693170aee3bea2dfec272e51bf3bc4d1297a53/cpp/src/arrow/compute/row/compare_internal.cc#L176-L199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
