Re: [I] Support "A column is known to be entirely NULL" in `PruningPredicate` [arrow-datafusion]

via GitHub Wed, 14 Feb 2024 08:54:40 -0800


appletreeisyellow commented on issue #9171:
URL: 
https://github.com/apache/arrow-datafusion/issues/9171#issuecomment-1944228016


   After discussion with @alamb, we plan to do the implementation in two phases:
   1. Turn each sub expression into a case expression
   2. Simplify the case expression and make it easy to read
   
   ## 1. Turn each sub expression into a case expression
   
   Each sub expression will be rewritten into a case expression instead of 
wrapping the entire expression into one case expression. Each sub expression 
has its own case expression will make sure the pruning predict rewrite logic is 
correct. 
   
   For example, `x < 5 AND x > 0 OR y = 10`
   
   will be rewritten into
   
   ```sql
   # x < 5
   CASE
     WHEN x_null_count = x_row_count THEN false
     ELSE x_max < 5 
   END
   AND
   #  x > 0
   CASE
     WHEN x_null_count = x_row_count THEN false
     ELSE 0 < x_min
   END
   OR
   # y = 10
   CASE
     WHEN y_null_count = y_row_count THEN false
     ELSE y_min <= 10 AND 10 <= y_max
   END
   ```
   
   However, as you can see from the example above, the final pruning predict 
rewrite can be long and hard to read. Therefore, we need phase 2 to improve the 
problem.
   
   ## 2. Simplify the case expression and make it easy to read
   
   Add format, like `()` and new lines to the expression string. I will have a 
better idea after phase 1 PR is done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Support "A column is known to be entirely NULL" in `PruningPredicate` [arrow-datafusion]

Reply via email to