jackkleeman opened a new pull request, #17743: URL: https://github.com/apache/datafusion/pull/17743
I am interested in predicates like: ``` CASE WHEN X THEN "a" WHEN Y THEN "b" ... END = "a" CASE WHEN X THEN "a" WHEN Y THEN "b" ... END != "a" ``` We already have a very nice 'case disaggregating' simplifier that can turn cases with boolean outputs into boolean logic statements. This PR adds some new functionality to make that simplifier more effective for literal comparisons: ## Literal pushdown We 'push down' literal comparisons into the case: ``` CASE WHEN X THEN "a" WHEN Y THEN "b" ... END = "a" -> CASE WHEN X THEN "a" = "a" WHEN Y THEN "b" = "a" ... END ``` This leads to the output of the case statement to be a boolean, enabling the case disaggregation, but it also leads 'then' expressions being simplified into boolean literals, which enables extra optimisations below. ## Disaggregate larger case expressions if only a small number of paths can be true We update the case disaggregation logic to allow >=3 when_then expressions if we know for sure that <3 of the when_thens can ever output true. This is knowable if the thens are literal booleans ie null, false or true - we know that any literal null or false thens will be dropped from the final expression, avoiding the O(n^2) explosion that necessitates the limit in the first place. ## Case inversion While the case disaggregation is awesome for small cases or cases with a small number of literal 'true' outputs, its not good for cases with a small number of literal 'false' outputs but a large number of literal 'true' outputs. This is common for comparisons like (CASE ... != "a") as typically all but one when_then will output 'true'. In those cases, we invert the case like so: ``` CASE WHEN X THEN true WHEN Y THEN false ... END -> NOT(CASE WHEN X THEN NOT(true) WHEN Y THEN NOT(false) ... END) ``` In doing so, we can turn it into a case that has a small number of literal 'true' outputs which will be picked up by the case disaggregation logic and turned into boolean logic statements. We rely on the fact that NOT(NOT(a)) === a for all literal bools. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
