saadtajwar commented on code in PR #23231:
URL: https://github.com/apache/datafusion/pull/23231#discussion_r3488970017


##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -826,6 +863,40 @@ impl BatchPartitioner {
                     // Finished building index-arrays for output partitions
                     timer.done();
 
+                    let partitioned_batches =
+                        Self::partition_grouped_take(&batch, indices, 
&self.timer)?;
+
+                    Box::new(partitioned_batches.into_iter())
+                }
+                BatchPartitionerState::Range {
+                    ordering,
+                    split_points,
+                    indices,
+                    partition_buffer,
+                } => {
+                    // Tracking time required for distributing indexes across 
output partitions
+                    let timer = self.timer.timer();
+
+                    let arrays = evaluate_expressions_to_arrays(
+                        ordering.iter().map(|e| &e.expr),
+                        &batch,
+                    )?;
+
+                    indices.iter_mut().for_each(|v| v.clear());
+                    let sort_options: Vec<SortOptions> =

Review Comment:
   I could create this at construction time to avoid re-creating on every 
invocation of `partition_iter`?



##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -836,17 +907,52 @@ impl BatchPartitioner {
         Ok(it)
     }
 
+    /// This function takes the `arrays` associated with the evaluated 
expressions for the ordering, split points and sort options, and indices array
+    /// Then for every row, creates the "row key" based on the given ordering 
for the range, and binary searches through the split points to find the 
appropriate partition index
+    /// That partition index is associated with the array in `indices`, which 
is given the row index, meaning that the row is sent to the partition at that 
index

Review Comment:
   This comment might not be the best right now lol - I'll try to sleep on this 
and come may edit with something more descriptive tomorrow, open to any 
suggestions :) 



##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -1441,7 +1547,7 @@ impl ExecutionPlan for RepartitionExec {
             }
             Partitioning::Range(_) => {
                 // Range partitioning optimizer propagation is tracked in
-                // https://github.com/apache/datafusion/issues/22395
+                // https://github.com/apache/datafusion/issues/23230

Review Comment:
   Intentionally left these un-implemented and created #23230 for them if 
that's OK!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to