[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #3380: RFC: Integrate `RowFilter` into `ParquetExec`

GitBox Tue, 06 Sep 2022 23:44:40 -0700


Ted-Jiang commented on PR #3380:
URL: 
https://github.com/apache/arrow-datafusion/pull/3380#issuecomment-1238974519


   >A separate conceptual question is around optimizing the number of distinct 
filters. In this design we simply assume that we want to break the filter into 
as many distinct predicates as we can but I'm not sure that is always the case 
given that this forces serial evaluation of the filters. I can imagine many 
cases where it would be better to group predicates together for evaluation. I 
didn't want to make the initial implementation too complicated so I punted on 
that for now, but eventually may want to do cost estimation at a higher level 
to determine the optimal grouping.
   
   @thinkharderdev  Agree! I remember each distinct filters will apply to the 
projected col with `selection`.
   
   One thing i want to mention , when applying filter pushdowm to parquet, some 
`filters exprs` are `partial_filters`, it will also exits in `filer operator`. 
I think before all filters base on min_max are `partial_filters`(is there any 
situation pushDowan to parquet use `full_filters`🤔 ).
   
   After use this row_filter i think it is a `full_filters` （we need some code 
change in push down rule implemention）and then we could eliminate the  `filters 
exprs`  in `filter operator`.🤔


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #3380: RFC: Integrate `RowFilter` into `ParquetExec`

Reply via email to