jychen7 commented on code in PR #6014:
URL: https://github.com/apache/arrow-datafusion/pull/6014#discussion_r1167685976
##########
datafusion/core/src/physical_plan/sorts/merge.rs:
##########
@@ -210,7 +218,14 @@ impl<C: Cursor> SortPreservingMergeStream<C> {
self.loser_tree_adjusted = false;
self.in_progress.push_row(stream_idx);
if self.in_progress.len() < self.batch_size {
- continue;
+ match self.fetch {
Review Comment:
> The loser tree, because it provides the loser of each match, will contain
repeat nodes. Since the heap is a data-storing structure, it won't contain
these redundancies.
> Another difference between the two is that the loser tree must be a full
binary tree (because it is a type of tournament tree), but the heap does not
necessarily have to be binary
>
https://stackoverflow.com/questions/7685943/whats-the-difference-between-a-heap-and-a-loser-tree-in-external-sorting
based on the above, do we need to switch to Heap to use the push-down
`fetch` and reduce memory usage?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]