[
https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924332#comment-15924332
]
jingzhang commented on FLINK-6037:
----------------------------------
[~fhueske], this issue is different from
https://issues.apache.org/jira/browse/FLINK-5394, this issue only happens in
the SQL.
I agree there has no difference between Table API and SQL since both are
represented the same way at the optimization layer. However, when using
{{SqlToRelConverter}} to convert SqlNode to RelNode, the metadata provider
would be overrided from {{FlinkDefaultRelMetadataProvider}} to
{{DefaultRelMetadataProvider}} again because of the following code:
{code}
val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder)
val config = SqlToRelConverter.configBuilder()
.withTrimUnusedFields(false).withConvertTableAccess(false).build()
val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter(
new ViewExpanderImpl, validator, createCatalogReader, cluster,
convertletTable, config)
{code}.
So in the optimization phase, Table API uses
{{FlinkDefaultRelMetadataProvider}} , but SQL uses
{{DefaultRelMetadataProvider}}.
> the estimateRowCount method of DataSetCalc didn't work in SQL
> -------------------------------------------------------------
>
> Key: FLINK-6037
> URL: https://issues.apache.org/jira/browse/FLINK-6037
> Project: Flink
> Issue Type: Sub-task
> Components: Table API & SQL
> Reporter: jingzhang
> Assignee: jingzhang
> Fix For: 1.2.0
>
>
> The estimateRowCount method of DataSetCalc didn't work in the following
> situation.
> If I run the following code,
> {code}
> Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where
> a==1 group by a");
> {code}
> the cost of every node in Optimized node tree is :
> {code}
> DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1,
> COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows,
> 5000.0 cpu, 28000.0 io}
> DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0,
> cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io}
> DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative
> cost = {1000.0 rows, 1000.0 cpu, 0.0 io}
> {code}
> We expect the input rowcount of DataSetAggregate less than 1000, however the
> actual input rowcount is still 1000 because the the estimateRowCount method
> of DataSetCalc didn't work.
> The problem is similar to the issue
> https://issues.apache.org/jira/browse/FLINK-5394 which is already solved.
> I find although we set metadata provider to
> {{FlinkDefaultRelMetadataProvider}} in {{FlinkRelBuilder}}, but after run
> {code}planner.rel(...) {code} to translate SqlNode to RelNode, the metadata
> provider would be overrided from {{FlinkDefaultRelMetadataProvider}} to
> {{DefaultRelMetadataProvider}} again because of the following code:
> {code}
> val cluster: RelOptCluster = RelOptCluster.create(planner, rexBuilder)
> val config = SqlToRelConverter.configBuilder()
> .withTrimUnusedFields(false).withConvertTableAccess(false).build()
> val sqlToRelConverter: SqlToRelConverter = new SqlToRelConverter(
> new ViewExpanderImpl, validator, createCatalogReader, cluster,
> convertletTable, config)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)