ozankabak commented on code in PR #5419:
URL: https://github.com/apache/arrow-datafusion/pull/5419#discussion_r1120950514
##########
datafusion/physical-expr/src/utils.rs:
##########
@@ -235,6 +239,80 @@ pub fn ordering_satisfy_concrete<F: FnOnce() ->
EquivalenceProperties>(
}
}
+/// Extract referenced [`Column`]s within a [`PhysicalExpr`].
+///
+/// This works recursively.
+pub fn get_phys_expr_columns(pred: &Arc<dyn PhysicalExpr>) -> HashSet<Column> {
+ let mut rewriter = ColumnCollector::default();
Review Comment:
Interesting! We had the same need (collecting columns) emerge in SHJ
implementation, so we used this more lightweight recursion:
```rust
fn collect_columns_recursive(expr: &Arc<dyn PhysicalExpr>, columns: &mut
Vec<Column>) {
if let Some(column) = expr.as_any().downcast_ref::<Column>() {
if !columns.iter().any(|c| c.eq(column)) {
columns.push(column.clone())
}
}
expr.children()
.iter()
.for_each(|e| collect_columns_recursive(e, columns))
}
fn collect_columns(expr: &Arc<dyn PhysicalExpr>) -> Vec<Column> {
let mut columns = vec![];
collect_columns_recursive(expr, &mut columns);
columns
}
```
We used a `Vec` instead of a `HashSet` due to anticipated small sizes, but
the code is essentially the same 🙂
This makes me think that doing a comprehensive code review and
collecting/coalescing/documenting utilities such as this may simplify the
codebase, and could be a worthy pursuit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]