alamb commented on issue #21231: URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4177839220
From what I can tell, the core issue is that CaseWhen currently assumes input column references can be discovered by finding built-in `Column` physical exprs. That is not true for @rluvaton’s system, and lambda scope makes it even trickier. I think there are two possible solutions: # Clean solution The cleanest fix seems to be adding an API to [PhysicalExpr](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html) to report in-scope input column usage, along the lines of what @pepijnve proposed in [this comment](https://github.com/apache/datafusion/issues/21231#issuecomment-4151111735) (column_index() / required_column_indices()). That is probably a non-trivial change, since I believe there are several places in DataFusion that downcast to `Column`. # Possible workaround One thing that might work for @rluvaton 's case is to make his `CustomExpr` is to report a real `Column` child for any expression that refers to the outer schema ```rust struct MyCustomColumn { // if this expression refers to a outer schema column (rather than something // bound in the lambda outer_column: Option<Arc<dyn PhysicalExpr>>, } impl PhysicalExpr MyCustomColumn .. fn children(&self) -> Vec<&Arc<dyn PhysicalExpr>> { // if the expression in the Lambda refers to an outer schema column, return // a reference to that column here: if let Some(outer_column) = self.outer_column { vec![outer_column] } else { vec![]; } } fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> { // if this is just a wrapper, use the inenr column if let Some(outer_column) = self.outer_column { outer_column.evaluate(batch) } else { // otherwise, do whatever is required for columns that are bound in lambda ... } } } ``` That way when traversing via the [PhysicalExpr::children](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html#tymethod.children) the rest of DataFusion would still see a `Column` expr for actual (outer) schema references This would require some sort of analysis pass on the lambda to identify the outer column `Column` references, but I suspect that already exists somewhere -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
