alamb commented on a change in pull request #1719:
URL: https://github.com/apache/arrow-datafusion/pull/1719#discussion_r796066392
##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -1291,6 +1337,11 @@ mod tests {
#[test]
fn simplify_expr_case_when_then_else() {
+ // CASE WHERE c2 != false THEN "ok" == "not_ok" ELSE c2 == true
Review comment:
🤔 I do wonder given the nullability concerns, if in addition to
"simplify" which seeks to retain the same semantics, there could be something
like:
```rust
/// returns true if this expr *always* returns null or false
/// (aka would filter out all rows from a query)
fn is_null_or_false(expr: &Expr) -> bool {
...
}
```
##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -662,6 +664,48 @@ impl<'a> ExprRewriter for Simplifier<'a> {
_ => unreachable!(),
},
+ //
+ // Rules for Case
+ //
+
+ // CASE
+ // WHEN X THEN A
+ // WHEN Y THEN B
+ // ...
+ // ELSE Q
+ // END
+ //
+ // ---> (X AND A) OR (Y AND B AND NOT X) OR ... (NOT (X OR Y) AND
Q)
Review comment:
```suggestion
// ---> (X AND A) OR (Y AND B AND NOT X) OR ... (NOT (X OR Y)
AND Q)
//
// Note: the rationale for this rewrite is that the expr can
then be further simplified
// using the existing rules for AND/OR
```
##########
File path: datafusion/src/optimizer/simplify_expressions.rs
##########
@@ -662,6 +664,48 @@ impl<'a> ExprRewriter for Simplifier<'a> {
_ => unreachable!(),
},
+ //
+ // Rules for Case
+ //
+
+ // CASE
+ // WHEN X THEN A
+ // WHEN Y THEN B
+ // ...
+ // ELSE Q
+ // END
+ //
+ // ---> (X AND A) OR (Y AND B AND NOT X) OR ... (NOT (X OR Y) AND
Q)
+ Case {
+ expr: None,
+ when_then_expr,
+ else_expr,
+ } if is_boolean_expr => {
+ // The disjunction of all the when predicates encountered so
far
Review comment:
Given the complexity of expression this could potentially produce
`O(n!)` maybe in terms of the number of `when` exprs, what do you think about
adding a heuristic that only does this rewrite when there are a 'small' number
of cases -- perhaps 2 or 3?
This would prevent some sort of pathological explosion and would handle the
usecase of simple mapping `CASE` statements
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]