wiedld commented on code in PR #15409: URL: https://github.com/apache/datafusion/pull/15409#discussion_r2076020887
########## datafusion/datasource/src/memory.rs: ########## @@ -723,6 +761,222 @@ impl MemorySourceConfig { pub fn original_schema(&self) -> SchemaRef { Arc::clone(&self.schema) } + + /// Repartition while preserving order. + /// + /// Returns `Ok(None)` if cannot fulfill the requested repartitioning, such + /// as having too few batches to fulfill the `target_partitions` or if unable + /// to preserve output ordering. + fn repartition_preserving_order( + &self, + target_partitions: usize, + output_ordering: LexOrdering, + ) -> Result<Option<Vec<Vec<RecordBatch>>>> { + if !self.eq_properties().ordering_satisfy(&output_ordering) { + Ok(None) + } else if self.partitions.len() == 1 { + self.repartition_evenly_by_size(target_partitions) + } else { Review Comment: Thank you for the code review. After adding an additional test case targeting the `self.partitions.len() == 1`, and ended up removing this conditional due to this: https://github.com/apache/datafusion/pull/15409/commits/57147add2bb1d540334b08639504b80930c1c982 Additionally, I added code comments about how the max heap is being used: https://github.com/apache/datafusion/pull/15409/commits/12e6dfb1dec6038b47a63019ee6965ed626bc68e I believe the `RePartition` struct already documents what prop is the max heap cmp, vs ordering. Please let me know if you spot anything else. Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org