Jihoon Son created TAJO-1597:
--------------------------------

             Summary: Problem of ignoring theta join condition
                 Key: TAJO-1597
                 URL: https://issues.apache.org/jira/browse/TAJO-1597
             Project: Tajo
          Issue Type: Bug
          Components: planner/optimizer
            Reporter: Jihoon Son
             Fix For: 0.11.0


Tajo currently does not support theta join, non-equi theta join conditions must 
be evaluated as a filter after join execution. However, when non-equi theta 
join conditions are included at on clauses, those conditions are disappeared 
after filter push down optimization.

This is because we assume that filters are pushed down from top during FPD 
phase, but theta join conditions from the on clause are not.

For example, let me suppose a query that contains a projection after a join as 
follows.
{noformat}
projection
       |
     join (contains a theta join condition)
    /     \
scan scan
{noformat}
During FPD optimization, the theta join condition is come up to the projection 
node (FilterPushDownRule.visitProjection()). This condition is converted based 
on information of transformedMap, which is created by 
findCanPushdownAndTransform() before visiting the join node. Obviously, any 
information for the theta join condition are not contained in transformedMap, 
it is ignored.

You can reproduce this bug as follows.
{noformat}
default> select n_nationkey, n_name, n_regionkey, t.cnt from nation n join (   
select r_regionkey, count(*) as cnt   from nation n   join region r on 
(n.n_regionkey = r.r_regionkey)   group by r_regionkey ) t  on  (n.n_regionkey 
= t.r_regionkey) and n.n_nationkey > t.cnt  order by n_nationkey;
...
-----------------------------
Query Block Graph
-----------------------------
|-#ROOT
   |-#QB_0
-----------------------------
Optimization Log:
[LogicalPlan]
        > ProjectionNode is eliminated.
        > ProjectionNode is eliminated.
[#QB_0]
        > Non-optimized join order: (default.nation ⋈θ default.region) (cost: 
86513.6)
        > Optimized join order    : (default.nation ⋈θ default.region) (cost: 
86513.6)
[#ROOT]
        > Non-optimized join order: (default.nation ⋈θ default.t) (cost: 
1.924062464E7)
        > Optimized join order    : (default.nation ⋈θ default.t) (cost: 
1.924062464E7)
-----------------------------

SORT(8)
  => Sort Keys: default.n.n_nationkey (INT4) (asc)
   JOIN(12)(INNER)
     => Join Cond: default.n.n_regionkey (INT4) = default.t.r_regionkey (INT4)
     => target list: default.n.n_nationkey (INT4), default.n.n_name (TEXT), 
default.n.n_regionkey (INT4), default.t.cnt (INT8)
     => out schema: {(4) default.n.n_nationkey (INT4), default.n.n_name (TEXT), 
default.n.n_regionkey (INT4), default.t.cnt (INT8)}
     => in schema: {(5) default.n.n_nationkey (INT4), default.n.n_name (TEXT), 
default.n.n_regionkey (INT4), default.t.cnt (INT8), default.t.r_regionkey 
(INT4)}
      TABLE_SUBQUERY(6) as default.t
        => Targets: default.t.cnt (INT8), default.t.r_regionkey (INT4)
        => out schema: {(2) default.t.cnt (INT8), default.t.r_regionkey (INT4)}
        => in  schema: {(2) default.t.r_regionkey (INT4), default.t.cnt (INT8)}
         GROUP_BY(4)(r_regionkey)
           => exprs: (count())
           => target list: default.r.r_regionkey (INT4), cnt (INT8)
           => out schema:{(2) default.r.r_regionkey (INT4), cnt (INT8)}
           => in schema:{(4) default.r.r_regionkey (INT4), 
default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name 
(TEXT)}
            JOIN(11)(INNER)
              => Join Cond: default.n.n_regionkey (INT4) = 
default.r.r_regionkey (INT4)
              => target list: default.r.r_regionkey (INT4), 
default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name 
(TEXT)
              => out schema: {(4) default.r.r_regionkey (INT4), 
default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name 
(TEXT)}
              => in schema: {(4) default.n.n_regionkey (INT4), 
default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.r.r_regionkey 
(INT4)}
               SCAN(2) on default.region as r
                 => target list: default.r.r_regionkey (INT4)
                 => out schema: {(1) default.r.r_regionkey (INT4)}
                 => in schema: {(3) default.r.r_regionkey (INT4), 
default.r.r_name (TEXT), default.r.r_comment (TEXT)}
               SCAN(1) on default.nation as n
                 => target list: default.n.n_regionkey (INT4), 
default.n.n_nationkey (INT4), default.n.n_name (TEXT)
                 => out schema: {(3) default.n.n_regionkey (INT4), 
default.n.n_nationkey (INT4), default.n.n_name (TEXT)}
                 => in schema: {(4) default.n.n_nationkey (INT4), 
default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.n.n_comment 
(TEXT)}
      SCAN(0) on default.nation as n
        => target list: default.n.n_nationkey (INT4), default.n.n_name (TEXT), 
default.n.n_regionkey (INT4)
        => out schema: {(3) default.n.n_nationkey (INT4), default.n.n_name 
(TEXT), default.n.n_regionkey (INT4)}
        => in schema: {(4) default.n.n_nationkey (INT4), default.n.n_name 
(TEXT), default.n.n_regionkey (INT4), default.n.n_comment (TEXT)}
...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to