alamb commented on code in PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#discussion_r2042923246


##########
datafusion/physical-plan/src/topk/mod.rs:
##########
@@ -202,27 +205,116 @@ impl TopK {
             })
             .collect::<Result<Vec<_>>>()?;
 
+        // Selected indices in the input batch.
+        // Some indices may be pre-filtered if they exceed the heap’s current 
max value.
+
+        let mut selected_rows = None;
+
+        // If the heap doesn't have k elements yet, we can't create thresholds
+        if let Some(max_row) = self.heap.max() {
+            // Get the batch that contains the max row
+            let batch_entry = match self.heap.store.get(max_row.batch_id) {
+                Some(entry) => entry,
+                None => return internal_err!("Invalid batch ID in TopKRow"),
+            };

Review Comment:
   I think you can write this a bit more succinctly
   
   ```suggestion
               let Some(entry) = self.heap.store.get(max_row.batch_id) else {
                   return internal_err!("Invalid batch ID in TopKRow"),
               };
   ```



##########
benchmarks/bench.sh:
##########
@@ -212,6 +212,10 @@ main() {
                     # same data as for tpch
                     data_tpch "1"
                     ;;
+                sort_tpch_limit)

Review Comment:
   Can we please also add a description of this benchmark in 
https://github.com/apache/datafusion/tree/main/benchmarks#benchmarks ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to