ZhangHuiGui commented on PR #41234:
URL: https://github.com/apache/arrow/pull/41234#issuecomment-2122946441

   ### Q1
   **Sorted mode** implementation has two main parts:
   1. Add the input cols to the `RowTable` using the `RowTableEncoder` pair 
according to the column sort in [1]. The main purpose of column sorting is to 
construct arrays of memory-alignment `column_offsets` to facilitate 
memory-aligned comparisons in subsequent column-Compare comparisons.
   
   2. The [2] logic in `CompareColumnsToRows` decides, according to 
`are_cols_in_encoding_order`, whether to take the offset of the comparing 
columns from `RowTable` (rows) in order of part1.
   
   **Non-soretd** mode is the opposite of the above logic, that is: column 
sorting is not performed in the part 1, and the `column_offsets` of the input 
original columns are used in the part 2.
   
   ### Q2
   I understand that the application scenario of non-sorted mode is that the 
input columns of `RowTableEncoder` are already sorted (can be accessed by 
memory alignment). When comparing, the input columns can be compared in the 
order of the input columns [2] without memory alignment check. This is also the 
reason for the improved performance in the non-sorted mode in the benchmark 
above.
   
   
   [1] 
https://github.com/apache/arrow/blob/e254c43c095bd6e33d07129257e11760f885f299/cpp/src/arrow/compute/row/row_internal.cc#L88
   
   [2] 
https://github.com/apache/arrow/blob/e254c43c095bd6e33d07129257e11760f885f299/cpp/src/arrow/compute/row/compare_internal.cc#L366-L368
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to