16pierre commented on issue #8609: URL: https://github.com/apache/datafusion/issues/8609#issuecomment-3773840878
Hi, we've been hitting the 20 limit on `PruningPredicate` in production and are wondering about paths forward to solve this problem. On the short-term, being able to configure the limit a bit above 20 could solve the majority of our problems - I'm not exactly sure how the magic constant 20 was determined, I take it this is because of tradeoffs with the implementation that translates `Inlist` into deeply nested binary exprs ? A bit more ambitious/long-term: if I understand correctly the various conversations/codepaths, given that in our case **we're interested in page index pruning based on min-max**: 1. we'd need to introduce support for large `IsIn` when building `PruningPredicate`, at least on this [PagePruningAccessPlanFilter path](https://github.com/apache/datafusion/blob/b4ba1c6ea99db22a37b281693598a3dbb2d546c2/datafusion/datasource-parquet/src/page_filter.rs#L127-L130) 2. we may not strictly need to implement `contained` because fallback to min-max is already handled [in PruningPredicate#prune](https://github.com/apache/datafusion/blob/1c86ec7f5244c3b2e6d3ac722640ef678a027a18/datafusion/pruning/src/pruning_predicate.rs#L545-L549) (called by [prune_pages_in_one_row_group](https://github.com/apache/datafusion/blob/b4ba1c6ea99db22a37b281693598a3dbb2d546c2/datafusion/datasource-parquet/src/page_filter.rs#L301)) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
