alamb commented on issue #21231:
URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4177839220

   From what I can tell, the core issue is that CaseWhen currently assumes 
input column references can be discovered by finding built-in `Column` physical 
exprs. That is not true for @rluvaton’s system, and lambda scope makes it even 
trickier.
   
   I think there are two possible solutions:
   
   # Clean solution
   
   The cleanest fix seems to be adding an API to 
[PhysicalExpr](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html)
 to report in-scope input column usage, along the lines of what @pepijnve 
proposed in [this 
comment](https://github.com/apache/datafusion/issues/21231#issuecomment-4151111735)
 (column_index() / required_column_indices()).
   
   That is probably a non-trivial change, since I believe there are several 
places in DataFusion that downcast to `Column`.
   
   # Possible workaround
   
   One thing that might work for @rluvaton 's case is to make his `CustomExpr` 
is to report a real `Column` child for any expression that refers to the outer 
schema 
   
   ```rust
   struct MyCustomColumn {
     // if this expression refers to a outer schema column (rather than 
something
     // bound in the lambda
     outer_column: Option<Arc<dyn PhysicalExpr>>, 
   }
   
   impl PhysicalExpr MyCustomColumn
    ..
     fn children(&self) -> Vec<&Arc<dyn PhysicalExpr>> {
       // if the expression in the Lambda refers to an outer schema column, 
return
       // a reference to that column here:
       if let Some(outer_column) = self.outer_column {
         vec![outer_column]
       } else {
         vec![];
       }
     }
     
     fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
       // if this is just a wrapper, use the inenr column
       if let Some(outer_column) = self.outer_column {
        outer_column.evaluate(batch)
       } else {
         // otherwise, do whatever is required for columns that are bound in 
lambda
         ...
       }
     }
         
   }
   ```
   
   That way when traversing via the 
[PhysicalExpr::children](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html#tymethod.children)
 the rest of DataFusion would still see a `Column` expr for actual (outer) 
schema references
   
   This would require some sort of analysis pass on the lambda to identify the  
outer column `Column` references, but I suspect that already exists somewhere


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to