alamb commented on issue #78:
URL: https://github.com/apache/arrow-datafusion/issues/78#issuecomment-827513734


   @Dandandan  -- yes, I think the "classic" thing to do is a "predicate 
rewrite" pass that rearranges predicates for further downstream operations
   
   The goal is basically to get the predicate into a form of 
   
   `good_predicate1` AND `good_predicate2` AND ...
   
   Where `good_predicate` means the predicate has special support in the 
execution engine. 
   
   Since OR is not typically handled specially, rewrites to AND are helpful
   
   Rewrite 1: (p and q1) OR (p and q2) OR (p and ..) ==>  p AND (q1 or q2)
   
   Another common rewrite I have seen is
   (col1 = A) OR (col1 = B) OR (col1 = C) ==> col1 IN (A, B, C)
   
   Which then the execution engine can treat like a single column predicate 
(push down to scan) and build a hash table for `(A, B, C)` and do fast 
filtering. 
   
   Shall I file an issue to track this kind of rewrtite?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to