Ted-Jiang commented on PR #3380: URL: https://github.com/apache/arrow-datafusion/pull/3380#issuecomment-1238974519
>A separate conceptual question is around optimizing the number of distinct filters. In this design we simply assume that we want to break the filter into as many distinct predicates as we can but I'm not sure that is always the case given that this forces serial evaluation of the filters. I can imagine many cases where it would be better to group predicates together for evaluation. I didn't want to make the initial implementation too complicated so I punted on that for now, but eventually may want to do cost estimation at a higher level to determine the optimal grouping. @thinkharderdev Agree! I remember each distinct filters will apply to the projected col with `selection`. One thing i want to mention , when applying filter pushdowm to parquet, some `filters exprs` are `partial_filters`, it will also exits in `filer operator`. I think before all filters base on min_max are `partial_filters`(is there any situation pushDowan to parquet use `full_filters`🤔 ). After use this row_filter i think it is a `full_filters` (we need some code change in push down rule implemention)and then we could eliminate the `filters exprs` in `filter operator`.🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
