Hyunsik Choi created TAJO-893:
---------------------------------

             Summary: Shared data flow should be supported.
                 Key: TAJO-893
                 URL: https://issues.apache.org/jira/browse/TAJO-893
             Project: Tajo
          Issue Type: Sub-task
          Components: DAG
            Reporter: Hyunsik Choi


Please see the following example (TPC-H Q2). This query uses 5 relation joins 
twice in the scalar subquery and the outer query block. If DAG framework 
support a shared data channel and we reuse the result of 5 relation joins, the 
query can avoids duplicated scans, data shuffles, and joins.

For this feature, first of all, we should support multiple output data channel. 
In addition, we should support shared data channel to transmission the same 
intermediate data without duplicated shuffles.

Please see also TAJO-161. TAJO-161 would make good use of this feature.

{code}
select
  s_acctbal,
  s_name,
  n_name,
  p_partkey,
  p_mfgr,
  s_address,
  s_phone,
  s_comment
from
  part,
  supplier,
  partsupp,
  nation,
  region
where
  p_partkey = ps_partkey
  and s_suppkey = ps_suppkey
  and p_size = 15
  and p_type like '%BRASS'
  and s_nationkey = n_nationkey
  and n_regionkey = r_regionkey
  and r_name = 'c'
  and ps_supplycost =
    (
      select min(ps_supplycost) from partsupp, supplier, nation, region
      where 
              p_partkey = ps_partkey
              and s_suppkey = ps_suppkey
              and s_nationkey = n_nationkey
              and n_regionkey = r_regionkey
              and r_name = 'EUROPE'
    )
order by 
  s_acctbal desc, 
  n_name, 
  s_name, 
  p_partkey
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to