adriangb opened a new issue, #19187:
URL: https://github.com/apache/datafusion/issues/19187

   Adapted from 
https://github.com/apache/datafusion/pull/19130#issuecomment-3621848790:
   
   It seems to me that we could improve the cohesiveness of the volatility API 
between logical and physical expressions and make it easier to write optimizer 
rules, etc. that take this into account if we formally introduced the concept 
of "expression volatility". It's somewhat implicit as of now, and how it 
relates to function volatility is not clear. My goal would be to be able to 
formally express that an expression is "constant" in terms of volatility. 
   
   For physical expressions we currently have `fn is_volatile(expr: &Arc<dyn 
PhysicalExpression>) -> bool`, but that doesn't tell you if it's a "stable" 
expression (like `column > 123`) or a "constant" expression (`456 > 123`).
   
   Looking at the logical expression const evaluator it seems like it does 
manual matching: 
https://github.com/apache/datafusion/blob/deaccb784ee1223bf9eed36562c5775aaedebc6b/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L637-L682.
   
   It looks like we have a `Volatility` targeted at functions:
   
   
https://github.com/apache/datafusion/blob/c479deeb668cc444936bbfa66100aef9b2013b2e/datafusion/expr-common/src/signature.rs#L53-L85
   
   But it does not have a "constant" option (nor would that really make sense 
for a function). `Constant` is not the same as immutable, as per the definition 
above `Immutable`:
   
   > Always returns the same output when given the same input.
   
   `Constant` would be:
   
   > Always returns the same output regardless of the input.
   
   (Generally because it has no input but I guess you could have an expression 
that takes an input but returns a constant / ignores the input?)
   
   Which makes me wonder if it wouldn't make sense to an enum along the lines 
of:
   
   ```rust
   pub enum ExprVolatility {
       Constant,
       Immutable,
       Stable,
       Volatile,
   }
   ```
   
   And augment `PhysicalExpression` and `Expr` with methods to calculate the 
volatility:
   
   ```rust
   impl Expr {
       pub fn node_volatility(&self) -> ExprVolatility {
           match self { ... }
       }
       pub fn volatility(&self) -> ExprVolatility {
           self.apply(...)  // recursive
       }
   }
   ```
   
   ```rust
   pub trait PhysicalExpr {
       fn node_volatility(&self) -> ExprVolatility {
           ExprVolatility::Volatilve
       }
       fn volatility(&self) -> ExprVolatility {
           self.apply(...)  // recursive
       }
   }
   ```
   
   This seems like it would be required for `PhysicalExpr` to be able to 
participate in volatility calculations. For `Expr` it would be more of a matter 
of having it all in once place that can be re-used by other optimizers, user 
code, etc.
   
   The main alternative I see to this is to define "input" as "columns/data" 
(as opposed to e.g. `now()`) such that "immutable functions that do not have 
any columns as children are considered constant", but I'm not sure if that is 
correct enough. Or we can leave it implicit as it is now w/ logic for constant 
sub expression detection scattered.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to