saadtajwar commented on code in PR #23231:
URL: https://github.com/apache/datafusion/pull/23231#discussion_r3488970017
##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -826,6 +863,40 @@ impl BatchPartitioner {
// Finished building index-arrays for output partitions
timer.done();
+ let partitioned_batches =
+ Self::partition_grouped_take(&batch, indices,
&self.timer)?;
+
+ Box::new(partitioned_batches.into_iter())
+ }
+ BatchPartitionerState::Range {
+ ordering,
+ split_points,
+ indices,
+ partition_buffer,
+ } => {
+ // Tracking time required for distributing indexes across
output partitions
+ let timer = self.timer.timer();
+
+ let arrays = evaluate_expressions_to_arrays(
+ ordering.iter().map(|e| &e.expr),
+ &batch,
+ )?;
+
+ indices.iter_mut().for_each(|v| v.clear());
+ let sort_options: Vec<SortOptions> =
Review Comment:
I could create this at construction time to avoid re-creating on every
invocation of `partition_iter`?
##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -836,17 +907,52 @@ impl BatchPartitioner {
Ok(it)
}
+ /// This function takes the `arrays` associated with the evaluated
expressions for the ordering, split points and sort options, and indices array
+ /// Then for every row, creates the "row key" based on the given ordering
for the range, and binary searches through the split points to find the
appropriate partition index
+ /// That partition index is associated with the array in `indices`, which
is given the row index, meaning that the row is sent to the partition at that
index
Review Comment:
This comment might not be the best right now lol - I'll try to sleep on this
and come may edit with something more descriptive tomorrow, open to any
suggestions :)
##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -1441,7 +1547,7 @@ impl ExecutionPlan for RepartitionExec {
}
Partitioning::Range(_) => {
// Range partitioning optimizer propagation is tracked in
- // https://github.com/apache/datafusion/issues/22395
+ // https://github.com/apache/datafusion/issues/23230
Review Comment:
Intentionally left these un-implemented and created #23230 for them if
that's OK!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]