[ 
https://issues.apache.org/jira/browse/FLINK-6037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jingzhang updated FLINK-6037:
-----------------------------
    Description: 
The estimateRowCount method of DataSetCalc didn't work in the following 
situation. 
If I run the following code,

{code}
Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where 
a==1 group by a");
{code}

the cost of every node in Optimized node tree is :

{code}
DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, 
COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 5000.0 
cpu, 28000.0 io}
  DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, cumulative 
cost = {2000.0 rows, 2000.0 cpu, 0.0 io}
      DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative 
cost = {1000.0 rows, 1000.0 cpu, 0.0 io}
{code}

We expect the input rowcount of DataSetAggregate less than 1000, however the 
actual input rowcount is still 1000 because the the estimateRowCount method of 
DataSetCalc didn't work. 

The problem is similar to the issue 
https://issues.apache.org/jira/browse/FLINK-5394 which is already solved.

I find although we set metadata provider to {{FlinkDefaultRelMetadataProvider}} 
in {{FlinkRelBuilder}}, but after run {code}planner.rel(...) {code} to 
translate SqlNode to RelNode, the  metadata provider would be overrided from 
{{FlinkDefaultRelMetadataProvider} to {{DefaultRelMetadataProvider}} again.

> the estimateRowCount method of DataSetCalc didn't work in SQL
> -------------------------------------------------------------
>
>                 Key: FLINK-6037
>                 URL: https://issues.apache.org/jira/browse/FLINK-6037
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: jingzhang
>            Assignee: jingzhang
>             Fix For: 1.2.0
>
>
> The estimateRowCount method of DataSetCalc didn't work in the following 
> situation. 
> If I run the following code,
> {code}
> Table table = tableEnv.sql("select a, avg(a), sum(b), count(c) from t1 where 
> a==1 group by a");
> {code}
> the cost of every node in Optimized node tree is :
> {code}
> DataSetAggregate(groupBy=[a], select=[a, AVG(a) AS TMP_0, SUM(b) AS TMP_1, 
> COUNT(c) AS TMP_2]): rowcount = 1000.0, cumulative cost = {3000.0 rows, 
> 5000.0 cpu, 28000.0 io}
>   DataSetCalc(select=[a, b, c], where=[=(a, 1)]): rowcount = 1000.0, 
> cumulative cost = {2000.0 rows, 2000.0 cpu, 0.0 io}
>       DataSetScan(table=[[_DataSetTable_0]]): rowcount = 1000.0, cumulative 
> cost = {1000.0 rows, 1000.0 cpu, 0.0 io}
> {code}
> We expect the input rowcount of DataSetAggregate less than 1000, however the 
> actual input rowcount is still 1000 because the the estimateRowCount method 
> of DataSetCalc didn't work. 
> The problem is similar to the issue 
> https://issues.apache.org/jira/browse/FLINK-5394 which is already solved.
> I find although we set metadata provider to 
> {{FlinkDefaultRelMetadataProvider}} in {{FlinkRelBuilder}}, but after run 
> {code}planner.rel(...) {code} to translate SqlNode to RelNode, the  metadata 
> provider would be overrided from {{FlinkDefaultRelMetadataProvider} to 
> {{DefaultRelMetadataProvider}} again.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to