jorisvandenbossche commented on PR #14641: URL: https://github.com/apache/arrow/pull/14641#issuecomment-1315460858
> Do I need to worry about there being multiple filters for 202107 in that expression? In my example, there are 3 fragments in that partition. I don't think you really have to worry about that (it won't change behaviour), although it might make a bit less efficient to apply the filter (not sure by heart). > 2\. There is no benefit in using `.isin()` for a single partition compared to `or`, right? In terms of performance/efficiency they are the same? In general, I think an `isin` filter will certainly be more efficient than the equivalent with multiple boolean comparisons (that's one of the goals for having `isin`). But that's when talking about applying such a filter to actual, materialized data. In the specific case where this filter only applies to partitioning fields, I suppose the situation is different. I am actually not fully sure if the code to evaluate pushdown filters would actually understand an `isin` kernel. I _think_ this is handled in `SimplifyWithGuarantee`: https://github.com/apache/arrow/blob/058d4f697a06477539e7f9ccf3e7c035f8cfbc5e/cpp/src/arrow/compute/exec/expression.cc#L1144-L1188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
