westonpace commented on code in PR #40817:
URL: https://github.com/apache/arrow/pull/40817#discussion_r1547783579
##########
cpp/src/arrow/acero/partition_util.h:
##########
@@ -62,7 +62,7 @@ class PartitionSort {
template <class INPUT_PRTN_ID_FN, class OUTPUT_POS_FN>
static void Eval(int64_t num_rows, int num_prtns, uint16_t* prtn_ranges,
INPUT_PRTN_ID_FN prtn_id_impl, OUTPUT_POS_FN
output_pos_impl) {
- ARROW_DCHECK(num_rows > 0 && num_rows <= (1 << 15));
+ ARROW_DCHECK(num_rows > 0 && num_rows <= ((1 << 16) - 1));
ARROW_DCHECK(num_prtns >= 1 && num_prtns <= (1 << 15));
Review Comment:
I believe the goal here is to use up to `dop_` partitions but only use that
many if we have enough rows to justify it. We only want to create more
partitions if these partitions have `min_num_rows_per_prtn` rows. If there are
not very many rows then we use fewer partitions.
Also, we require `num_prtns_` to be a power of 2 because we are going to
calculate the partition id using masking and so we need `log_num_prtns_` to
tell us how many bits to use for masking.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]