yordan-pavlov commented on issue #723:
URL: 
https://github.com/apache/arrow-datafusion/issues/723#issuecomment-880244444


   @mcassels thank you for reporting this - good find; in the implementation of 
the pruning predicate there is an assumption that for a predicate expression 
`f(v) OP c`, where v is any value in a column chunk and c is a constant / 
literal, then `f(v_min) <= f(v) <= f(v_max)` for any value v in that column 
chunk. What I did not take into account with the initial implementation of the 
parquet predicate push-down implementation is that this assumption is, sadly, 
not true for all functions supported by datafusion (obviously the function 
`ABS(c0 - 251.10794896957802)` being one example).
   
   As a shorter-term fix, in order to ensure correctness, the predicate 
push-down feature could be limited to simpler expressions (such as `col < 
literal`).
   
   Longer term, in order to support predicate push-down with more complex 
expressions, there must be a way to find out if the predicate expression 
satisfies the assumption detailed above (such as having a list of compatible 
functions as @lvheyang suggested above), without actually evaluating the 
expression.
   
   Sadly, I don't have time at the moment to work on this, due to things in my 
personal life.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to