[ 
https://issues.apache.org/jira/browse/TAJO-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639865#comment-14639865
 ] 

ASF GitHub Bot commented on TAJO-1597:
--------------------------------------

Github user asfgit closed the pull request at:

    https://github.com/apache/tajo/pull/646


> Problem of ignoring theta join condition
> ----------------------------------------
>
>                 Key: TAJO-1597
>                 URL: https://issues.apache.org/jira/browse/TAJO-1597
>             Project: Tajo
>          Issue Type: Bug
>          Components: Planner/Optimizer
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>             Fix For: 0.11.0
>
>
> Tajo currently does not support theta join, non-equi theta join conditions 
> must be evaluated as a filter after join execution. However, when non-equi 
> theta join conditions are included at on clauses, those conditions are 
> disappeared after filter push down optimization.
> This is because we assume that filters are pushed down from top during FPD 
> phase, but theta join conditions from the on clause are not.
> For example, let me suppose a query that contains a projection after a join 
> as follows.
> {noformat}
> projection
>        |
>      join (contains a theta join condition)
>     /     \
> scan scan
> {noformat}
> During FPD optimization, the theta join condition is come up to the 
> projection node (FilterPushDownRule.visitProjection()). This condition is 
> converted based on information of transformedMap, which is created by 
> findCanPushdownAndTransform() before visiting the join node. Obviously, any 
> information for the theta join condition are not contained in transformedMap, 
> it is ignored.
> You can reproduce this bug as follows.
> {noformat}
> default> select n_nationkey, n_name, n_regionkey, t.cnt from nation n join (  
>  select r_regionkey, count(*) as cnt   from nation n   join region r on 
> (n.n_regionkey = r.r_regionkey)   group by r_regionkey ) t  on  
> (n.n_regionkey = t.r_regionkey) and n.n_nationkey > t.cnt  order by 
> n_nationkey;
> ...
> -----------------------------
> Query Block Graph
> -----------------------------
> |-#ROOT
>    |-#QB_0
> -----------------------------
> Optimization Log:
> [LogicalPlan]
>       > ProjectionNode is eliminated.
>       > ProjectionNode is eliminated.
> [#QB_0]
>       > Non-optimized join order: (default.nation ⋈θ default.region) (cost: 
> 86513.6)
>       > Optimized join order    : (default.nation ⋈θ default.region) (cost: 
> 86513.6)
> [#ROOT]
>       > Non-optimized join order: (default.nation ⋈θ default.t) (cost: 
> 1.924062464E7)
>       > Optimized join order    : (default.nation ⋈θ default.t) (cost: 
> 1.924062464E7)
> -----------------------------
> SORT(8)
>   => Sort Keys: default.n.n_nationkey (INT4) (asc)
>    JOIN(12)(INNER)
>      => Join Cond: default.n.n_regionkey (INT4) = default.t.r_regionkey (INT4)
>      => target list: default.n.n_nationkey (INT4), default.n.n_name (TEXT), 
> default.n.n_regionkey (INT4), default.t.cnt (INT8)
>      => out schema: {(4) default.n.n_nationkey (INT4), default.n.n_name 
> (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8)}
>      => in schema: {(5) default.n.n_nationkey (INT4), default.n.n_name 
> (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8), 
> default.t.r_regionkey (INT4)}
>       TABLE_SUBQUERY(6) as default.t
>         => Targets: default.t.cnt (INT8), default.t.r_regionkey (INT4)
>         => out schema: {(2) default.t.cnt (INT8), default.t.r_regionkey 
> (INT4)}
>         => in  schema: {(2) default.t.r_regionkey (INT4), default.t.cnt 
> (INT8)}
>          GROUP_BY(4)(r_regionkey)
>            => exprs: (count())
>            => target list: default.r.r_regionkey (INT4), cnt (INT8)
>            => out schema:{(2) default.r.r_regionkey (INT4), cnt (INT8)}
>            => in schema:{(4) default.r.r_regionkey (INT4), 
> default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name 
> (TEXT)}
>             JOIN(11)(INNER)
>               => Join Cond: default.n.n_regionkey (INT4) = 
> default.r.r_regionkey (INT4)
>               => target list: default.r.r_regionkey (INT4), 
> default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name 
> (TEXT)
>               => out schema: {(4) default.r.r_regionkey (INT4), 
> default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name 
> (TEXT)}
>               => in schema: {(4) default.n.n_regionkey (INT4), 
> default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.r.r_regionkey 
> (INT4)}
>                SCAN(2) on default.region as r
>                  => target list: default.r.r_regionkey (INT4)
>                  => out schema: {(1) default.r.r_regionkey (INT4)}
>                  => in schema: {(3) default.r.r_regionkey (INT4), 
> default.r.r_name (TEXT), default.r.r_comment (TEXT)}
>                SCAN(1) on default.nation as n
>                  => target list: default.n.n_regionkey (INT4), 
> default.n.n_nationkey (INT4), default.n.n_name (TEXT)
>                  => out schema: {(3) default.n.n_regionkey (INT4), 
> default.n.n_nationkey (INT4), default.n.n_name (TEXT)}
>                  => in schema: {(4) default.n.n_nationkey (INT4), 
> default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.n.n_comment 
> (TEXT)}
>       SCAN(0) on default.nation as n
>         => target list: default.n.n_nationkey (INT4), default.n.n_name 
> (TEXT), default.n.n_regionkey (INT4)
>         => out schema: {(3) default.n.n_nationkey (INT4), default.n.n_name 
> (TEXT), default.n.n_regionkey (INT4)}
>         => in schema: {(4) default.n.n_nationkey (INT4), default.n.n_name 
> (TEXT), default.n.n_regionkey (INT4), default.n.n_comment (TEXT)}
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to