alamb commented on code in PR #6260:
URL: https://github.com/apache/arrow-datafusion/pull/6260#discussion_r1187729554
##########
datafusion/core/src/physical_plan/sorts/merge.rs:
##########
@@ -242,22 +260,61 @@ impl<C: Cursor> SortPreservingMergeStream<C> {
}
}
+ /// Find the leaf node index in the loser tree for the given cursor index
+ ///
+ /// Note that this is not necessarily a leaf node in the tree, but it can
+ /// also be a half-node (a node with only one child). This happens when the
+ /// number of cursors/streams is not a power of two. Thus, the loser tree
+ /// will be unbalanced, but it will still work correctly.
+ ///
+ /// For example, with 5 streams, the loser tree will look like this:
+ ///
+ /// ```text
+ /// 0 (winner)
+ ///
+ /// 1
+ /// / \
+ /// 2 3
+ /// / \ / \
+ /// 4 | | |
+ /// / \ | | |
+ /// -+---+--+---+---+---- Below is not a part of loser tree
+ /// S3 S4 S0 S1 S2
+ /// ```
+ ///
+ /// S0, S1, ... S4 are the streams (read: stream at index 0, stream at
+ /// index 1, etc.)
+ ///
+ /// Zooming in at node 2 in the loser tree as an example, we can see that
+ /// it takes as input the next item at (S0) and the loser of (S3, S4).
+ ///
+ #[inline]
+ fn lt_leaf_node_index(&self, cursor_index: usize) -> usize {
+ (self.cursors.len() + cursor_index) / 2
+ }
+
+ /// Find the parent node index for the given node index
Review Comment:
👍 I like refactoring of this functionality as a way to document what is
going on
##########
datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs:
##########
@@ -506,6 +506,71 @@ mod tests {
assert_batches_eq!(exp, collected.as_slice());
}
+ #[tokio::test]
+ async fn test_merge_five_partitions() {
Review Comment:
I wonder if you can leave some comments about why this test was added / what
it is covering to help future readers?
I think the coverage of merging is pretty good with this:
https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/merge_fuzz.rs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]