alamb commented on issue #78: URL: https://github.com/apache/arrow-datafusion/issues/78#issuecomment-827513734
@Dandandan -- yes, I think the "classic" thing to do is a "predicate rewrite" pass that rearranges predicates for further downstream operations The goal is basically to get the predicate into a form of `good_predicate1` AND `good_predicate2` AND ... Where `good_predicate` means the predicate has special support in the execution engine. Since OR is not typically handled specially, rewrites to AND are helpful Rewrite 1: (p and q1) OR (p and q2) OR (p and ..) ==> p AND (q1 or q2) Another common rewrite I have seen is (col1 = A) OR (col1 = B) OR (col1 = C) ==> col1 IN (A, B, C) Which then the execution engine can treat like a single column predicate (push down to scan) and build a hash table for `(A, B, C)` and do fast filtering. Shall I file an issue to track this kind of rewrtite? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
