Re: [PR] Enable repartitioning on MemTable. [datafusion]

via GitHub Tue, 06 May 2025 11:17:30 -0700


wiedld commented on code in PR #15409:
URL: https://github.com/apache/datafusion/pull/15409#discussion_r2076020887



##########
datafusion/datasource/src/memory.rs:
##########
@@ -723,6 +761,222 @@ impl MemorySourceConfig {
     pub fn original_schema(&self) -> SchemaRef {
         Arc::clone(&self.schema)
     }
+
+    /// Repartition while preserving order.
+    ///
+    /// Returns `Ok(None)` if cannot fulfill the requested repartitioning, such
+    /// as having too few batches to fulfill the `target_partitions` or if 
unable
+    /// to preserve output ordering.
+    fn repartition_preserving_order(
+        &self,
+        target_partitions: usize,
+        output_ordering: LexOrdering,
+    ) -> Result<Option<Vec<Vec<RecordBatch>>>> {
+        if !self.eq_properties().ordering_satisfy(&output_ordering) {
+            Ok(None)
+        } else if self.partitions.len() == 1 {
+            self.repartition_evenly_by_size(target_partitions)
+        } else {

Review Comment:
   Thank you for the code review.
   
   After adding an additional test case targeting the `self.partitions.len() == 
1`, and ended up removing this conditional due to this: 
https://github.com/apache/datafusion/pull/15409/commits/57147add2bb1d540334b08639504b80930c1c982
   
   Additionally, I added code comments about how the max heap is being used: 
https://github.com/apache/datafusion/pull/15409/commits/12e6dfb1dec6038b47a63019ee6965ed626bc68e
   
   I believe the `RePartition` struct already documents what prop is the max 
heap cmp, vs ordering.
   
   Please let me know if you spot anything else. Thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Enable repartitioning on MemTable. [datafusion]

Reply via email to