[
https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885098#comment-15885098
]
godfrey he edited comment on FLINK-5859 at 2/27/17 4:12 AM:
------------------------------------------------------------
Hi, [~fhueske], Thanks for you advice.
IMO, Rules including {{PushProjectIntoBatchTableSourceScanRule}},
{{PushFilterIntoBatchTableSourceScanRule}}, {{PartitionPruningRule}} (maybe, we
integrate it in PushFilterIntoBatchTableSourceScanRule) and so on are need be
applied only once and do not need cost model actually. And Rules including
{{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on
do not need real cost, dummy cost is enough. Rules including
{{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}} and so on need to be applied
with real cost. So we want to break the optimization phase down into 3 phases
later. The whole optimization include 5 steps:
# decorrelates a query
# normalize the logical plan with HEP planner
# optimize the logical plan with Volcano planner and dummy cost (rules include
{{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on)
# optimize the physical plan with HEP planner (rules include
{{PushProjectIntoBatchTableSourceScanRule}},
{{PushFilterIntoBatchTableSourceScanRule}} and so on)
# optimize the physical plan with Volcano planner and real cost (rules include
{{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}} and so on)
At that time, each optimization phase keeps the complexity as small as
possible. And your concern can be eliminated also.
Looking forward to your advice, thanks.
was (Author: godfreyhe):
Hi, [~fhueske], Thanks for you advice.
IMO, Rules including {{PushProjectIntoBatchTableSourceScanRule}},
{{PushFilterIntoBatchTableSourceScanRule}}, {{PartitionPruningRule}} (maybe, we
integrate it in PushFilterIntoBatchTableSourceScanRule) and so on are need be
applied only once and do not need cost model actually. And Rules including
{{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on
do not need real cost, dummy cost is enough. Rules including
{{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}} and so on are applied with
real cost. So we want to break the optimization phase down into 3 phases later.
The whole optimization include 5 steps:
# decorrelates a query
# normalize the logical plan with HEP planner
# optimize the logical plan with Volcano planner and dummy cost(including
{{FilterCalcMergeRule}}, {{FilterJoinRule}}, {{DataSetCalcRule}} and so on)
# optimize the physical plan with HEP planner (including
{{PushProjectIntoBatchTableSourceScanRule}},
{{PushFilterIntoBatchTableSourceScanRule}} and so on)
# optimize the physical plan with Volcano planner and real cost (including
{{LoptOptimizeJoinRule}}, {{JoinToMultiJoinRule}} and so on)
At that time, each optimization phase keeps the complexity as small as
possible. And your concern can be eliminated also.
Looking forward to your advice, thanks.
> support partition pruning on Table API & SQL
> --------------------------------------------
>
> Key: FLINK-5859
> URL: https://issues.apache.org/jira/browse/FLINK-5859
> Project: Flink
> Issue Type: New Feature
> Components: Table API & SQL
> Reporter: godfrey he
> Assignee: godfrey he
>
> Many data sources are partitionable storage, e.g. HDFS, Druid. And many
> queries just need to read a small subset of the total data. We can use
> partition information to prune or skip over files irrelevant to the user’s
> queries. Both query optimization time and execution time can be reduced
> obviously, especially for a large partitioned table.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)