jorisvandenbossche commented on PR #14641:
URL: https://github.com/apache/arrow/pull/14641#issuecomment-1315460858

   > Do I need to worry about there being multiple filters for 202107 in that 
expression? In my example, there are 3 fragments in that partition.
   
   I don't think you really have to worry about that (it won't change 
behaviour), although it might make a bit less efficient to apply the filter 
(not sure by heart).
   
   > 2\. There is no benefit in using `.isin()` for a single partition compared 
to `or`, right? In terms of performance/efficiency they are the same?
   
   In general, I think an `isin` filter will certainly be more efficient than 
the equivalent with multiple boolean comparisons (that's one of the goals for 
having `isin`). 
   But that's when talking about applying such a filter to actual, materialized 
data. In the specific case where this filter only applies to partitioning 
fields, I suppose the situation is different. I am actually not fully sure if 
the code to evaluate pushdown filters would actually understand an `isin` 
kernel. I _think_ this is handled in `SimplifyWithGuarantee`:
   
   
https://github.com/apache/arrow/blob/058d4f697a06477539e7f9ccf3e7c035f8cfbc5e/cpp/src/arrow/compute/exec/expression.cc#L1144-L1188
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to