sdf-jkl commented on code in PR #18789:
URL: https://github.com/apache/datafusion/pull/18789#discussion_r2611900925


##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1960,6 +1964,41 @@ impl<S: SimplifyInfo> TreeNodeRewriter for 
Simplifier<'_, S> {
                 }))
             }
 
+            // =======================================
+            // preimage_in_comparison
+            // =======================================
+            //
+            // For case:
+            // date_part(expr as 'YEAR') op literal
+            //
+            // Background:
+            // Datasources such as Parquet can prune partitions using simple 
predicates,
+            // but they cannot do so for complex expressions.
+            // For a complex predicate like `date_part('YEAR', c1) < 2000`, 
pruning is not possible.
+            // After rewriting it to `c1 < 2000-01-01`, pruning becomes 
feasible.
+            Expr::BinaryExpr(BinaryExpr { left, op, right })
+                if 
is_scalar_udf_expr_and_support_preimage_in_comparison_for_binary(
+                    info, &left, op, &right,
+                ) =>
+            {
+                preimage_in_comparison_for_binary(info, *left, *right, op)?
+            }

Review Comment:
   Actually, upon further tinkering with the code, I realized that in this 
snippet `rewrite_with_preimage` takes the wrong expression as input (`Scalar` 
Literal instead of the left column)
   
   However, we also can't just use the `left` expression. The `left` expression 
is a `ScalarUDFExpression`, we still need to extract the Column expression from 
the udf `args`.
   
   We actually do the extraction inside the `preimage` call and I believe it's 
the most convenient place to do so.
   It is because different functions have different arguments and we won't know 
which one will be the column expression.
   
   I propose changing the `preimage` function signature to:
   ```rust
   fn preimage(
           &self,
           _args: &[Expr],
           _lit_expr: &Expr,
           _info: &dyn SimplifyInfo,
       ) -> Result<(Option<Interval>, Expr)> 
   ```
   But if we go this route the function loses its straightforwardness. 
   
   Basically, I'm looking for a way to extract the Column expression argument 
from any `ScalarUDFExpression`, no matter what function we use.
   
   Maybe a new udf method  - `.get_column()` or something similar...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to