suibianwanwank commented on code in PR #15697:
URL: https://github.com/apache/datafusion/pull/15697#discussion_r2042485745
##########
datafusion/physical-plan/src/topk/mod.rs:
##########
@@ -202,27 +204,121 @@ impl TopK {
})
.collect::<Result<Vec<_>>>()?;
+ // Selected indices in the input batch.
+ // Some indices may be pre-filtered if they exceed the heap’s current
max value.
+
+ let mut selected_rows = None;
+
+ // If the heap doesn't have k elements yet, we can't create thresholds
+ if let Some(max_row) = self.heap.max() {
+ // Get the batch that contains the max row
+ let batch_entry = match self.heap.store.get(max_row.batch_id) {
+ Some(entry) => entry,
+ None => return internal_err!("Invalid batch ID in TopKRow"),
+ };
+
+ // Extract threshold values for each sort expression
+ // TODO: create a filter for each key that respects lexical
ordering
+ // in the form of col0 < threshold0 || col0 == threshold0 && (col1
< threshold1 || ...)
+ // This could use BinaryExpr to benefit from short circuiting and
early evaluation
+ // https://github.com/apache/datafusion/issues/15698
+ // Extract the value for this column from the max row
+ let expr = Arc::clone(&self.expr[0].expr);
+ let value = expr.evaluate(&batch_entry.batch.slice(max_row.index,
1))?;
+
+ // Convert to scalar value - should be a single value since we're
evaluating on a single row batch
+ let threshold = Scalar::new(value.to_array(1)?);
Review Comment:
Do we need to skip the case of being NULL here, because `NULL > any number`
is false.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]