ZhangHuiGui commented on PR #41234: URL: https://github.com/apache/arrow/pull/41234#issuecomment-2122946441
### Q1 **Sorted mode** implementation has two main parts: 1. Add the input cols to the `RowTable` using the `RowTableEncoder` pair according to the column sort in [1]. The main purpose of column sorting is to construct arrays of memory-alignment `column_offsets` to facilitate memory-aligned comparisons in subsequent column-Compare comparisons. 2. The [2] logic in `CompareColumnsToRows` decides, according to `are_cols_in_encoding_order`, whether to take the offset of the comparing columns from `RowTable` (rows) in order of part1. **Non-soretd** mode is the opposite of the above logic, that is: column sorting is not performed in the part 1, and the `column_offsets` of the input original columns are used in the part 2. ### Q2 I understand that the application scenario of non-sorted mode is that the input columns of `RowTableEncoder` are already sorted (can be accessed by memory alignment). When comparing, the input columns can be compared in the order of the input columns [2] without memory alignment check. This is also the reason for the improved performance in the non-sorted mode in the benchmark above. [1] https://github.com/apache/arrow/blob/e254c43c095bd6e33d07129257e11760f885f299/cpp/src/arrow/compute/row/row_internal.cc#L88 [2] https://github.com/apache/arrow/blob/e254c43c095bd6e33d07129257e11760f885f299/cpp/src/arrow/compute/row/compare_internal.cc#L366-L368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
