16pierre commented on issue #8609:
URL: https://github.com/apache/datafusion/issues/8609#issuecomment-3773840878

   Hi, we've been hitting the 20 limit on `PruningPredicate` in production and 
are wondering about paths forward to solve this problem.
   
   On the short-term, being able to configure the limit a bit above 20 could 
solve the majority of our problems - I'm not exactly sure how the magic 
constant 20 was determined, I take it this is because of tradeoffs with the 
implementation that translates `Inlist` into deeply nested binary exprs ?
   
   A bit more ambitious/long-term: if I understand correctly the various 
conversations/codepaths, given that in our case **we're interested in page 
index pruning based on min-max**: 
   1. we'd need to introduce support for large `IsIn` when building 
`PruningPredicate`, at least on this [PagePruningAccessPlanFilter 
path](https://github.com/apache/datafusion/blob/b4ba1c6ea99db22a37b281693598a3dbb2d546c2/datafusion/datasource-parquet/src/page_filter.rs#L127-L130)
   2. we may not strictly need to implement `contained` because fallback to 
min-max is already handled [in 
PruningPredicate#prune](https://github.com/apache/datafusion/blob/1c86ec7f5244c3b2e6d3ac722640ef678a027a18/datafusion/pruning/src/pruning_predicate.rs#L545-L549)
 (called by 
[prune_pages_in_one_row_group](https://github.com/apache/datafusion/blob/b4ba1c6ea99db22a37b281693598a3dbb2d546c2/datafusion/datasource-parquet/src/page_filter.rs#L301))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to