[
https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885098#comment-15885098
]
godfrey he commented on FLINK-5859:
-----------------------------------
Hi, [~fhueske], Thanks for you advice.
IMO, Rules including `PushProjectIntoBatchTableSourceScanRule`,
`PushFilterIntoBatchTableSourceScanRule`, `PartitionPruningRule`(maybe, we
integrate it in PushFilterIntoBatchTableSourceScanRule) and so on are need be
applied only once and do not need cost model actually. And Rules including
`FilterCalcMergeRule`, `FilterJoinRule`, `DataSetCalcRule` and so on
do not need real cost, dummy cost is enough. Rules including
`LoptOptimizeJoinRule`, `JoinToMultiJoinRule` and so on are applied with real
cost. So we want to break the optimization phase down into 3 phases later. The
whole optimization include 5 steps:
1. decorrelates a query
2. normalize the logical plan with HEP planner
3. optimize the logical plan with Volcano planner and dummy cost(including
`FilterCalcMergeRule`, `FilterJoinRule`, `DataSetCalcRule` and so on)
4. optimize the physical plan with HEP planner (including
`PushProjectIntoBatchTableSourceScanRule`,
`PushFilterIntoBatchTableSourceScanRule` and so on)
5. optimize the physical plan with Volcano planner and real cost (including
`LoptOptimizeJoinRule`, `JoinToMultiJoinRule` and so on)
At that time, each optimization phase keeps the complexity as small as
possible. And your concern can be eliminated also.
Looking forward to your advice, thanks.
> support partition pruning on Table API & SQL
> --------------------------------------------
>
> Key: FLINK-5859
> URL: https://issues.apache.org/jira/browse/FLINK-5859
> Project: Flink
> Issue Type: New Feature
> Components: Table API & SQL
> Reporter: godfrey he
> Assignee: godfrey he
>
> Many data sources are partitionable storage, e.g. HDFS, Druid. And many
> queries just need to read a small subset of the total data. We can use
> partition information to prune or skip over files irrelevant to the user’s
> queries. Both query optimization time and execution time can be reduced
> obviously, especially for a large partitioned table.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)