[ 
https://issues.apache.org/jira/browse/FLINK-15555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014914#comment-17014914
 ] 

hailong wang commented on FLINK-15555:
--------------------------------------

Thank you [~godfreyhe] for detailed reply. TABLE_OPTIMIZER_REUSE_SOURCE_ENABLED 
is used to control whether to reuse source or not for user. But it sometime 
does not work. For the followig sql:
{code:java}
SELECT t1.a, t2.a FROM x t1, x t2 WHERE true
{code}
If we set TABLE_OPTIMIZER_REUSE_SOURCE_ENABLED = false, the final plan is:
{code:java}
Join(joinType=[InnerJoin], where=[true], select=[a, a0], 
leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey])
:- Exchange(distribution=[single], reuse_id=[1])
:  +- Calc(select=[a])
:     +- TableSourceScan(table=[[default_catalog, default_database, x, source: 
[TestTableSource(a, b, c)]]], fields=[a, b, c])
+- Reused(reference_id=[1])
{code}
For the Cal is reused, So the input relnode also be reused.

So I think we can shield the conception of TABLE_OPTIMIZER_REUSE_SOURCE_ENABLED 
for users. And only  let users set TABLE_OPTIMIZER_REUSE_SUB_PLAN_ENABLED.

Users can also set TABLE_OPTIMIZER_REUSE_SUB_PLAN_ENABLED to  decide whether to 
reuse source.

 

> Delete TABLE_OPTIMIZER_REUSE_SOURCE_ENABLED  option for subplaner reuse
> -----------------------------------------------------------------------
>
>                 Key: FLINK-15555
>                 URL: https://issues.apache.org/jira/browse/FLINK-15555
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>    Affects Versions: 1.10.0
>            Reporter: hailong wang
>            Priority: Major
>             Fix For: 1.11.0
>
>
> Blink planner supports subplan reuse. If 
> TABLE_OPTIMIZER_REUSE_SUB_PLAN_ENABLED is true, the optimizer will try to 
> find out duplicated sub-plans and reuse them. And  if 
> TABLE_OPTIMIZER_REUSE_SOURCE_ENABLED is true, the optimizer will try to find 
> out duplicated table sources and reuse them.
> The option of TABLE_OPTIMIZER_REUSE_SOURCE_ENABLED used to defined whether 
> TableSourceScan should be reused.
> But if  the parent's relNode of TableSourceScan can be reused, it will be 
> also reused even if TABLE_OPTIMIZER_REUSE_SOURCE_ENABLED is false, just like 
> follow sql:
> {code:java}
> WITH t AS (SELECT a, b, e FROM x, y WHERE x.a = y.d)
> SELECT t1.*, t2.* FROM t t1, t t2 WHERE t1.b = t2.e AND t1.a < 10 AND t2.a > 5
> {code}
> the plan may be as follow:
> {code:java}
> HashJoin(joinType=[InnerJoin], where=[=(b, e0)], select=[a, b, e, a0, b0, 
> e0], build=[right])
> :- Exchange(distribution=[hash[b]], shuffle_mode=[BATCH])
> :  +- Calc(select=[a, b, e])
> :     +- HashJoin(joinType=[InnerJoin], where=[=(a, d)], select=[a, b, d, e], 
> build=[left])
> :        :- Exchange(distribution=[hash[a]])
> :        :  +- Calc(select=[a, b], where=[<(a, 10)])
> :        :     +- TableSourceScan(table=[[default_catalog, default_database, 
> x, source: [TestTableSource(a, b, c)]]], fields=[a, b, c])
> :        +- Exchange(distribution=[hash[d]], reuse_id=[1])
> :           +- Calc(select=[d, e])
> :              +- TableSourceScan(table=[[default_catalog, default_database, 
> y, source: [TestTableSource(d, e, f)]]], fields=[d, e, f])
> +- Exchange(distribution=[hash[e]])
>    +- Calc(select=[a, b, e])
>       +- HashJoin(joinType=[InnerJoin], where=[=(a, d)], select=[a, b, d, e], 
> build=[left])
>          :- Exchange(distribution=[hash[a]])
>          :  +- Calc(select=[a, b], where=[>(a, 5)])
>          :     +- TableSourceScan(table=[[default_catalog, default_database, 
> x, source: [TestTableSource(a, b, c)]]], fields=[a, b, c])
>          +- Reused(reference_id=[1])
> {code}
> So I think it is useless to defined this option, only 
> TABLE_OPTIMIZER_REUSE_SUB_PLAN_ENABLED will be ok.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to