Re: [PR] GH-41233: [C++] Added an are_cols_sorted option to RowTableMetadata for control column sorted [arrow]

via GitHub Tue, 21 May 2024 08:56:30 -0700


ZhangHuiGui commented on PR #41234:
URL: https://github.com/apache/arrow/pull/41234#issuecomment-2122946441

### Q1
**Sorted mode** implementation has two main parts:
1. Add the input cols to the `RowTable` using the `RowTableEncoder` pair
according to the column sort in [1]. The main purpose of column sorting is to
construct arrays of memory-alignment `column_offsets` to facilitate
memory-aligned comparisons in subsequent column-Compare comparisons.

2. The [2] logic in `CompareColumnsToRows` decides, according to
`are_cols_in_encoding_order`, whether to take the offset of the comparing
columns from `RowTable` (rows) in order of part1.

**Non-soretd** mode is the opposite of the above logic, that is: column
sorting is not performed in the part 1, and the `column_offsets` of the input
original columns are used in the part 2.

### Q2
I understand that the application scenario of non-sorted mode is that the
input columns of `RowTableEncoder` are already sorted (can be accessed by
memory alignment). When comparing, the input columns can be compared in the
order of the input columns [2] without memory alignment check. This is also the
reason for the improved performance in the non-sorted mode in the benchmark
above.

[1]
https://github.com/apache/arrow/blob/e254c43c095bd6e33d07129257e11760f885f299/cpp/src/arrow/compute/row/row_internal.cc#L88

[2]
https://github.com/apache/arrow/blob/e254c43c095bd6e33d07129257e11760f885f299/cpp/src/arrow/compute/row/compare_internal.cc#L366-L368

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-41233: [C++] Added an are_cols_sorted option to RowTableMetadata for control column sorted [arrow]

Reply via email to