alamb commented on code in PR #13826: URL: https://github.com/apache/datafusion/pull/13826#discussion_r1893395562
########## datafusion/physical-plan/src/sorts/sort_preserving_merge.rs: ########## @@ -105,6 +119,14 @@ impl SortPreservingMergeExec { } /// Sets the selection strategy of tied winners of the loser tree algorithm + /// Review Comment: this is the actual public facing API descriptin change ########## datafusion/physical-plan/src/sorts/sort_preserving_merge.rs: ########## @@ -105,6 +119,14 @@ impl SortPreservingMergeExec { } /// Sets the selection strategy of tied winners of the loser tree algorithm + /// + /// When true (the default) equal output rows are placed in the merged + /// stream when ready, which is faster but not stable (can vary from + /// run to run). Review Comment: You are right -- I double checked the other parts of the code -- I think I misudnerstood the root need for this option. I removed the discssion of polling when not ready. ########## datafusion/physical-plan/src/sorts/merge.rs: ########## @@ -99,19 +99,25 @@ pub(crate) struct SortPreservingMergeStream<C: CursorValues> { /// Configuration parameter to enable round-robin selection of tied winners of loser tree. /// - /// To address the issue of unbalanced polling between partitions due to tie-breakers being based - /// on partition index, especially in cases of low cardinality, we are making changes to the winner - /// selection mechanism. Previously, partitions with smaller indices were consistently chosen as the winners, - /// leading to an uneven distribution of polling. This caused upstream operator buffers for the other partitions - /// to grow excessively, as they continued receiving data without consuming it. + /// This option controls the tie-breaker strategy and attempts to avoid the Review Comment: I found this related description and updated it to describe the current state of the code, rather than the changes that were made previously -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org