Hyunsik Choi created TAJO-893:
---------------------------------
Summary: Shared data flow should be supported.
Key: TAJO-893
URL: https://issues.apache.org/jira/browse/TAJO-893
Project: Tajo
Issue Type: Sub-task
Components: DAG
Reporter: Hyunsik Choi
Please see the following example (TPC-H Q2). This query uses 5 relation joins
twice in the scalar subquery and the outer query block. If DAG framework
support a shared data channel and we reuse the result of 5 relation joins, the
query can avoids duplicated scans, data shuffles, and joins.
For this feature, first of all, we should support multiple output data channel.
In addition, we should support shared data channel to transmission the same
intermediate data without duplicated shuffles.
Please see also TAJO-161. TAJO-161 would make good use of this feature.
{code}
select
s_acctbal,
s_name,
n_name,
p_partkey,
p_mfgr,
s_address,
s_phone,
s_comment
from
part,
supplier,
partsupp,
nation,
region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 15
and p_type like '%BRASS'
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'c'
and ps_supplycost =
(
select min(ps_supplycost) from partsupp, supplier, nation, region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)