jackkleeman opened a new pull request, #17743:
URL: https://github.com/apache/datafusion/pull/17743

   I am interested in predicates like:
   ```
   CASE WHEN X THEN "a" WHEN Y THEN "b" ... END = "a"
   CASE WHEN X THEN "a" WHEN Y THEN "b" ... END != "a"
   ```
   
   We already have a very nice 'case disaggregating' simplifier that can turn 
cases with boolean outputs into boolean logic statements. This PR adds some new 
functionality to make that simplifier more effective for literal comparisons:
   ## Literal pushdown
   We 'push down' literal comparisons into the case:
   ```
   CASE WHEN X THEN "a" WHEN Y THEN "b" ... END = "a"
   ->
   CASE WHEN X THEN "a"  = "a" WHEN Y THEN "b" = "a" ... END
   ```
   This leads to the output of the case statement to be a boolean, enabling the 
case disaggregation, but it also leads 'then' expressions being simplified into 
boolean literals, which enables extra optimisations below.
   
   ## Disaggregate larger case expressions if only a small number of paths can 
be true
   
   We update the case disaggregation logic to allow >=3 when_then expressions 
if we know for sure that <3 of the when_thens can ever output true. This is 
knowable if the thens are literal booleans ie null, false or true - we know 
that any literal null or false thens will be dropped from the final expression, 
avoiding the O(n^2) explosion that necessitates the limit in the first place.
   
   ## Case inversion
   While the case disaggregation is awesome for small cases or cases with a 
small number of literal 'true' outputs, its not good for cases with a small 
number of literal 'false' outputs but a large number of literal 'true' outputs. 
This is common for comparisons like (CASE ... != "a") as typically all but one 
when_then will output 'true'.
   
   In those cases, we invert the case like so:
   ```
   CASE WHEN X THEN true WHEN Y THEN false ... END
   ->
   NOT(CASE WHEN X THEN NOT(true) WHEN Y THEN NOT(false) ... END)
   ```
   In doing so, we can turn it into a case that has a small number of literal 
'true' outputs which will be picked up by the case disaggregation logic and 
turned into boolean logic statements. We rely on the fact that NOT(NOT(a)) === 
a for all literal bools.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to