Dandandan commented on code in PR #21726:
URL: https://github.com/apache/datafusion/pull/21726#discussion_r3106956973


##########
datafusion/optimizer/src/optimize_projections/required_indices.rs:
##########
@@ -105,44 +104,59 @@ impl RequiredIndices {
     /// Adds the indices of the fields referred to by the given expression
     /// `expr` within the given schema (`input_schema`).
     ///
-    /// Self is NOT compacted (and thus this method is not pub)
+    /// Self is NOT compacted (duplicate indices are removed by a subsequent
+    /// [`Self::compact`] call), and thus this method is not pub.
     ///
     /// # Parameters
     ///
     /// * `input_schema`: The input schema to analyze for index requirements.
     /// * `expr`: An expression for which we want to find necessary field 
indices.
     fn add_expr(&mut self, input_schema: &DFSchemaRef, expr: &Expr) {
-        // TODO could remove these clones (and visit the expression directly)
-        let mut cols = expr.column_refs();
-        // Get outer-referenced (subquery) columns:
-        outer_columns(expr, &mut cols);
-        self.indices.reserve(cols.len());
-        for col in cols {
-            if let Some(idx) = input_schema.maybe_index_of_column(col) {
-                self.indices.push(idx);
+        // `apply` does not descend into subqueries, so recurse manually to
+        // handle those cases.
+        expr.apply(|e| {
+            match e {
+                Expr::Column(c) | Expr::OuterReferenceColumn(_, c) => {
+                    if let Some(idx) = input_schema.maybe_index_of_column(c) {
+                        self.indices.push(idx);
+                    }
+                }
+                Expr::ScalarSubquery(sub) => {
+                    self.add_exprs(input_schema, &sub.outer_ref_columns);
+                }
+                Expr::Exists(ex) => {
+                    self.add_exprs(input_schema, 
&ex.subquery.outer_ref_columns);
+                }
+                Expr::InSubquery(isq) => {
+                    self.add_exprs(input_schema, 
&isq.subquery.outer_ref_columns);
+                }
+                _ => {}
             }
+            Ok(TreeNodeRecursion::Continue)
+        })
+        .expect("traversal is infallible");
+    }
+
+    /// Like [`Self::add_expr`], but for multiple expressions.
+    fn add_exprs<'a>(
+        &mut self,
+        input_schema: &DFSchemaRef,
+        exprs: impl IntoIterator<Item = &'a Expr>,
+    ) {
+        for expr in exprs {

Review Comment:
   I wonder if we could use some more `extend` approaches (reducing allocations 
/ capacity checks) a bit better in these APIs - I think I often see `Vec` 
reallocations come up quite a bit because of `Vec:push` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to