alamb commented on code in PR #13826:
URL: https://github.com/apache/datafusion/pull/13826#discussion_r1893395562


##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -105,6 +119,14 @@ impl SortPreservingMergeExec {
     }
 
     /// Sets the selection strategy of tied winners of the loser tree algorithm
+    ///

Review Comment:
   this is the actual public facing API descriptin change



##########
datafusion/physical-plan/src/sorts/sort_preserving_merge.rs:
##########
@@ -105,6 +119,14 @@ impl SortPreservingMergeExec {
     }
 
     /// Sets the selection strategy of tied winners of the loser tree algorithm
+    ///
+    /// When true (the default) equal output rows are placed in the merged
+    /// stream when ready, which is faster but not stable (can vary from
+    /// run to run).

Review Comment:
   You are right -- I double checked the other parts of the code -- I think I 
misudnerstood the root need for this option.
   
   I removed the discssion of polling when not ready.



##########
datafusion/physical-plan/src/sorts/merge.rs:
##########
@@ -99,19 +99,25 @@ pub(crate) struct SortPreservingMergeStream<C: 
CursorValues> {
 
     /// Configuration parameter to enable round-robin selection of tied 
winners of loser tree.
     ///
-    /// To address the issue of unbalanced polling between partitions due to 
tie-breakers being based
-    /// on partition index, especially in cases of low cardinality, we are 
making changes to the winner
-    /// selection mechanism. Previously, partitions with smaller indices were 
consistently chosen as the winners,
-    /// leading to an uneven distribution of polling. This caused upstream 
operator buffers for the other partitions
-    /// to grow excessively, as they continued receiving data without 
consuming it.
+    /// This option controls the tie-breaker strategy and attempts to avoid the

Review Comment:
   I found this related description and updated it to describe the current 
state of the code, rather than the changes that were made previously



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to