[ 
https://issues.apache.org/jira/browse/FLINK-20478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-20478:
-------------------------------

    Assignee:     (was: Jark Wu)

> Adjust the explain result
> -------------------------
>
>                 Key: FLINK-20478
>                 URL: https://issues.apache.org/jira/browse/FLINK-20478
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table SQL / Planner
>            Reporter: godfrey he
>            Priority: Major
>
> Currently, the explain result includes "Abstract Syntax Tree", "Optimized 
> Logical Plan" and "Physical Execution Plan". While the "Optimized Logical 
> Plan" is an {{ExecNode}} graph, and the 
> "[ExplainDetail|https://github.com/apache/flink/blob/master/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/ExplainDetail.java]";
>  represents the expected explain details, including {{ESTIMATED_COST}} and 
> {{CHANGELOG_MODE}} now. Those types can only used for Calicte {{RelNode}}s 
> instead of {{ExecNode}}. So I suggest to make the following adjustments:
> 1. Keep "Abstract Syntax Tree" as it, which represents the original 
> (un-optimized) {{RelNode}} graph converted from {{SqlNode}}.
> 2. Rename "Optimized Logical Plan" to "Optimized Physical Plan", which 
> represents the optimized physical {{RelNode}} graph composed of 
> {{FlinkPhysicalRel}}. {{ESTIMATED_COST}} and {{CHANGELOG_MODE}} describe the 
> expected explain details for "Optimized Physical Plan".
> 3.Replace "Physical Execution Plan" with "Optimized Execution Plan", which 
> represents the optimized {{ExecNode}} graph. Currently, many optimizations 
> are based on {{ExecNode}} graph, such as sub-plan reuse, multiple input 
> rewrite. We may introduce more optimizations in the future. So there are more 
> and more difference between "Optimized Physical Plan" and "Optimized 
> Execution Plan". We do not want to show tow execution plans, and "Physical 
> Execution Plan" for {{StreamGraph}} is less important than "Optimized 
> Execution Plan". If we want to introduce "Physical Execution Plan" in the 
> future, we can add a type named "PHYSICAL_EXECUTION_PLAN" in 
> {{ExplainDetail}} to support it. There is already an issue to do the similar 
> things, [FLINK-19687|https://issues.apache.org/jira/browse/FLINK-19687]
> The following example show the explain result after adjustment:
> {code}
> == Abstract Syntax Tree ==
> LogicalLegacySink(name=[`default_catalog`.`default_database`.`upsertSink1`], 
> fields=[a, cnt])
> +- LogicalProject(a=[$0], cnt=[$1])
>    +- LogicalFilter(condition=[>($1, 10)])
>       +- LogicalAggregate(group=[{0}], cnt=[COUNT()])
>          +- LogicalProject(a=[$0])
>             +- LogicalTableScan(table=[[default_catalog, default_database, 
> MyTable1]])
> LogicalLegacySink(name=[`default_catalog`.`default_database`.`upsertSink2`], 
> fields=[a, cnt])
> +- LogicalProject(a=[$0], cnt=[$1])
>    +- LogicalFilter(condition=[<($1, 10)])
>       +- LogicalAggregate(group=[{0}], cnt=[COUNT()])
>          +- LogicalProject(a=[$0])
>             +- LogicalTableScan(table=[[default_catalog, default_database, 
> MyTable1]])
> == Optimized Physical Plan ==
> LegacySink(name=[`default_catalog`.`default_database`.`upsertSink1`], 
> fields=[a, cnt])
> +- Calc(select=[a, cnt], where=[>(cnt, 10)])
>    +- GroupAggregate(groupBy=[a], select=[a, COUNT(*) AS cnt])
>       +- Exchange(distribution=[hash[a]])
>          +- Calc(select=[a])
>             +- DataStreamScan(table=[[default_catalog, default_database, 
> MyTable1]], fields=[a, b, c])
> LegacySink(name=[`default_catalog`.`default_database`.`upsertSink2`], 
> fields=[a, cnt])
> +- Calc(select=[a, cnt], where=[<(cnt, 10)])
>    +- GroupAggregate(groupBy=[a], select=[a, COUNT(*) AS cnt])
>       +- Exchange(distribution=[hash[a]])
>          +- Calc(select=[a])
>             +- DataStreamScan(table=[[default_catalog, default_database, 
> MyTable1]], fields=[a, b, c])
> == Optimized Execution Plan ==
> GroupAggregate(groupBy=[a], select=[a, COUNT(*) AS cnt], reuse_id=[1])
> +- Exchange(distribution=[hash[a]])
>    +- Calc(select=[a])
>       +- DataStreamScan(table=[[default_catalog, default_database, 
> MyTable1]], fields=[a, b, c])
> LegacySink(name=[`default_catalog`.`default_database`.`upsertSink1`], 
> fields=[a, cnt])
> +- Calc(select=[a, cnt], where=[>(cnt, 10)])
>    +- Reused(reference_id=[1])
> LegacySink(name=[`default_catalog`.`default_database`.`upsertSink2`], 
> fields=[a, cnt])
> +- Calc(select=[a, cnt], where=[<(cnt, 10)])
>    +- Reused(reference_id=[1])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to