[ 
https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885098#comment-15885098
 ] 

godfrey he commented on FLINK-5859:
-----------------------------------

Hi, [~fhueske], Thanks for you advice. 

IMO, Rules including `PushProjectIntoBatchTableSourceScanRule`, 
`PushFilterIntoBatchTableSourceScanRule`, `PartitionPruningRule`(maybe, we 
integrate it in PushFilterIntoBatchTableSourceScanRule) and so on are need be 
applied only once and do not need cost model actually. And Rules including 
`FilterCalcMergeRule`, `FilterJoinRule`, `DataSetCalcRule` and so on 
do not need real cost, dummy cost is enough. Rules including 
`LoptOptimizeJoinRule`, `JoinToMultiJoinRule` and so on are applied with  real 
cost. So we want to break the optimization phase down into 3 phases later. The 
whole optimization include 5 steps: 
1. decorrelates a query
2. normalize the logical plan with HEP planner
3. optimize the logical plan with Volcano planner and dummy cost(including 
`FilterCalcMergeRule`, `FilterJoinRule`, `DataSetCalcRule` and so on)
4. optimize the physical plan with HEP planner (including 
`PushProjectIntoBatchTableSourceScanRule`, 
`PushFilterIntoBatchTableSourceScanRule` and so on)
5. optimize the physical plan with Volcano planner and real cost (including 
`LoptOptimizeJoinRule`, `JoinToMultiJoinRule` and so on)

At that time, each optimization phase  keeps the complexity as small as 
possible. And your concern can be eliminated also. 

Looking forward to your advice, thanks.

> support partition pruning on Table API & SQL
> --------------------------------------------
>
>                 Key: FLINK-5859
>                 URL: https://issues.apache.org/jira/browse/FLINK-5859
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: godfrey he
>            Assignee: godfrey he
>
> Many data sources are partitionable storage, e.g. HDFS, Druid. And many 
> queries just need to read a small subset of the total data. We can use 
> partition information to prune or skip over files irrelevant to the user’s 
> queries. Both query optimization time and execution time can be reduced 
> obviously, especially for a large partitioned table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to