westonpace commented on issue #36283:
URL: https://github.com/apache/arrow/issues/36283#issuecomment-1613227675

   This is expected but could be improved.  Parquet predicate pushdown works 
like so:
   
    * Extract a row group guarantee from parquet statistics (e.g. `30 < x < 70 
&& 0 < y < 100`)
    * Call `SimplifyWithGuarantee` on the filter, given the above guarantee
      * For example, a filter `x == 100 && z < 20` would simplify to `false`.
   
   The `SimplifyWithGuarantee` method does not understand `isin`.  It could be 
improved to do so if someone were interested.  The place to make the change 
would be here I think: 
https://github.com/apache/arrow/blob/apache-arrow-12.0.1/cpp/src/arrow/compute/expression.cc#L1230
   
   First we "extract known values" (places in the guarantee where we have 
something like x == 7).  This usually wouldn't apply because equality 
guarantees come from partitioning and not from parquet statistics.
   
   Second, we consider inequalities in the guarantee.  This is the part that is 
critical for parquet predicate pushdown.  We then call Inequality::Simplify 
which looks for places in the filter that are:
   
    * calls to is_valid or is_null (these might be simplified by an inequality)
    * comparisons (these might also be simplified by an inequality)
   
   I think the point you are making is that `isin` is another function that may 
be simplified by an inequality.  If we know that x > 100 and the filter is 
`isin(0, 7, 12)` then we can simplify this to `literal(false)`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to