[jira] [Closed] (FLINK-12321) Supports sub-plan reuse

Kurt Young (JIRA) Fri, 26 Apr 2019 01:27:43 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kurt Young closed FLINK-12321.
------------------------------
       Resolution: Fixed
    Fix Version/s: 1.9.0

fixed in 34f392d2218406cfec9b74fb0efddbcaa3764f89

> Supports sub-plan reuse
> -----------------------
>
>                 Key: FLINK-12321
>                 URL: https://issues.apache.org/jira/browse/FLINK-12321
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / Planner
>            Reporter: godfrey he
>            Assignee: godfrey he
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.9.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Many query plans have duplicated sub-plan, and their computing logic is 
> identical. So we could reuse duplicated sub-plans to reduce computing. 
> Digest of RelNode is an identifier to distinguish RelNode, and the digest of 
> sub-plan contains all inner-nodes' digests. we could use the digest of 
> sub-plan to find duplicated sub-plan.
> e.g. 
> {code:sql}
> WITH r AS (SELECT a FROM x LIMIT 10)
> SELECT r1.a FROM r r1, r r2 WHERE r1.a = r2.a
> the physical plan after sub-plan reuse:
> Calc(select=[a])
> +- HashJoin(joinType=[InnerJoin], where=[=(a, a0)], select=[a, a0], 
> isBroadcast=[true], build=[right])
>    :- Exchange(distribution=[hash[a]])
>    :  +- Calc(select=[a], reuse_id=[1])
>    :     +- Limit(offset=[0], fetch=[10], global=[true])
>    :        +- Exchange(distribution=[single])
>    :           +- Limit(offset=[0], fetch=[10], global=[false])
>    :              +- TableSourceScan(table=[[x, source: [TestTableSource(a, 
> b, c)]]], fields=[a, b, c])
>    +- Exchange(distribution=[broadcast])
>       +- Reused(reference_id=[1])
> {code}
> sub-plan: Calc-Limit-Exchange-Limit-TableSourceScan is reused.
> For batch job, reused node might lead to a deadlock when a HashJoin or 
> NestedLoopJoin have same reused inputs. The probe side of HashJoin could 
> start to read data only after build side has finished. If there is no full 
> dam (DamBehavior#FULL_DAM) operators (e.g. Sort, HashAggregate) in probe 
> side, the data will be blocked by probe side. In this case, we could set 
> Exchange node (if it does not exist, add new one) as BATCH mode to break up 
> the deadlock.(The Exchange node is a full dam operator now)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (FLINK-12321) Supports sub-plan reuse

Reply via email to