[
https://issues.apache.org/jira/browse/FLINK-12321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kurt Young closed FLINK-12321.
------------------------------
Resolution: Fixed
Fix Version/s: 1.9.0
fixed in 34f392d2218406cfec9b74fb0efddbcaa3764f89
> Supports sub-plan reuse
> -----------------------
>
> Key: FLINK-12321
> URL: https://issues.apache.org/jira/browse/FLINK-12321
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / Planner
> Reporter: godfrey he
> Assignee: godfrey he
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.9.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Many query plans have duplicated sub-plan, and their computing logic is
> identical. So we could reuse duplicated sub-plans to reduce computing.
> Digest of RelNode is an identifier to distinguish RelNode, and the digest of
> sub-plan contains all inner-nodes' digests. we could use the digest of
> sub-plan to find duplicated sub-plan.
> e.g.
> {code:sql}
> WITH r AS (SELECT a FROM x LIMIT 10)
> SELECT r1.a FROM r r1, r r2 WHERE r1.a = r2.a
> the physical plan after sub-plan reuse:
> Calc(select=[a])
> +- HashJoin(joinType=[InnerJoin], where=[=(a, a0)], select=[a, a0],
> isBroadcast=[true], build=[right])
> :- Exchange(distribution=[hash[a]])
> : +- Calc(select=[a], reuse_id=[1])
> : +- Limit(offset=[0], fetch=[10], global=[true])
> : +- Exchange(distribution=[single])
> : +- Limit(offset=[0], fetch=[10], global=[false])
> : +- TableSourceScan(table=[[x, source: [TestTableSource(a,
> b, c)]]], fields=[a, b, c])
> +- Exchange(distribution=[broadcast])
> +- Reused(reference_id=[1])
> {code}
> sub-plan: Calc-Limit-Exchange-Limit-TableSourceScan is reused.
> For batch job, reused node might lead to a deadlock when a HashJoin or
> NestedLoopJoin have same reused inputs. The probe side of HashJoin could
> start to read data only after build side has finished. If there is no full
> dam (DamBehavior#FULL_DAM) operators (e.g. Sort, HashAggregate) in probe
> side, the data will be blocked by probe side. In this case, we could set
> Exchange node (if it does not exist, add new one) as BATCH mode to break up
> the deadlock.(The Exchange node is a full dam operator now)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)