Dandandan commented on code in PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#discussion_r2041528193


##########
datafusion/physical-plan/src/topk/mod.rs:
##########
@@ -202,27 +204,99 @@ impl TopK {
             })
             .collect::<Result<Vec<_>>>()?;
 
+        // selected indices
+        let mut selected_rows = None;
+
+        // If the heap doesn't have k elements yet, we can't create thresholds
+        if let Some(max_row) = self.heap.max() {
+            // Get the batch that contains the max row
+            let batch_entry = match self.heap.store.get(max_row.batch_id) {
+                Some(entry) => entry,
+                None => return internal_err!("Invalid batch ID in TopKRow"),
+            };
+
+            // Extract threshold values for each sort expression
+            // TODO: create a filter for each key that respects lexical 
ordering

Review Comment:
   Yes, it might very well be evaluating all keys is more expensive than 
converting to rows. 
   
   It might benefit from optimizations like in 
https://github.com/apache/datafusion/pull/15694 to avoid evaluating the 
expression for values already filtered out...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to