[
https://issues.apache.org/jira/browse/TAJO-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638102#comment-14638102
]
ASF GitHub Bot commented on TAJO-1597:
--------------------------------------
GitHub user jihoonson opened a pull request:
https://github.com/apache/tajo/pull/643
TAJO-1597: Problem of ignoring theta join condition
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jihoonson/tajo-2 TAJO-1597
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/643.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #643
----
commit 9b21fd94a6619957bf70604b6edc1e86a8adf7d7
Author: Jihoon Son <[email protected]>
Date: 2015-07-23T03:31:50Z
TAJO-1597
----
> Problem of ignoring theta join condition
> ----------------------------------------
>
> Key: TAJO-1597
> URL: https://issues.apache.org/jira/browse/TAJO-1597
> Project: Tajo
> Issue Type: Bug
> Components: Planner/Optimizer
> Reporter: Jihoon Son
> Fix For: 0.11.0
>
>
> Tajo currently does not support theta join, non-equi theta join conditions
> must be evaluated as a filter after join execution. However, when non-equi
> theta join conditions are included at on clauses, those conditions are
> disappeared after filter push down optimization.
> This is because we assume that filters are pushed down from top during FPD
> phase, but theta join conditions from the on clause are not.
> For example, let me suppose a query that contains a projection after a join
> as follows.
> {noformat}
> projection
> |
> join (contains a theta join condition)
> / \
> scan scan
> {noformat}
> During FPD optimization, the theta join condition is come up to the
> projection node (FilterPushDownRule.visitProjection()). This condition is
> converted based on information of transformedMap, which is created by
> findCanPushdownAndTransform() before visiting the join node. Obviously, any
> information for the theta join condition are not contained in transformedMap,
> it is ignored.
> You can reproduce this bug as follows.
> {noformat}
> default> select n_nationkey, n_name, n_regionkey, t.cnt from nation n join (
> select r_regionkey, count(*) as cnt from nation n join region r on
> (n.n_regionkey = r.r_regionkey) group by r_regionkey ) t on
> (n.n_regionkey = t.r_regionkey) and n.n_nationkey > t.cnt order by
> n_nationkey;
> ...
> -----------------------------
> Query Block Graph
> -----------------------------
> |-#ROOT
> |-#QB_0
> -----------------------------
> Optimization Log:
> [LogicalPlan]
> > ProjectionNode is eliminated.
> > ProjectionNode is eliminated.
> [#QB_0]
> > Non-optimized join order: (default.nation ⋈θ default.region) (cost:
> 86513.6)
> > Optimized join order : (default.nation ⋈θ default.region) (cost:
> 86513.6)
> [#ROOT]
> > Non-optimized join order: (default.nation ⋈θ default.t) (cost:
> 1.924062464E7)
> > Optimized join order : (default.nation ⋈θ default.t) (cost:
> 1.924062464E7)
> -----------------------------
> SORT(8)
> => Sort Keys: default.n.n_nationkey (INT4) (asc)
> JOIN(12)(INNER)
> => Join Cond: default.n.n_regionkey (INT4) = default.t.r_regionkey (INT4)
> => target list: default.n.n_nationkey (INT4), default.n.n_name (TEXT),
> default.n.n_regionkey (INT4), default.t.cnt (INT8)
> => out schema: {(4) default.n.n_nationkey (INT4), default.n.n_name
> (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8)}
> => in schema: {(5) default.n.n_nationkey (INT4), default.n.n_name
> (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8),
> default.t.r_regionkey (INT4)}
> TABLE_SUBQUERY(6) as default.t
> => Targets: default.t.cnt (INT8), default.t.r_regionkey (INT4)
> => out schema: {(2) default.t.cnt (INT8), default.t.r_regionkey
> (INT4)}
> => in schema: {(2) default.t.r_regionkey (INT4), default.t.cnt
> (INT8)}
> GROUP_BY(4)(r_regionkey)
> => exprs: (count())
> => target list: default.r.r_regionkey (INT4), cnt (INT8)
> => out schema:{(2) default.r.r_regionkey (INT4), cnt (INT8)}
> => in schema:{(4) default.r.r_regionkey (INT4),
> default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name
> (TEXT)}
> JOIN(11)(INNER)
> => Join Cond: default.n.n_regionkey (INT4) =
> default.r.r_regionkey (INT4)
> => target list: default.r.r_regionkey (INT4),
> default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name
> (TEXT)
> => out schema: {(4) default.r.r_regionkey (INT4),
> default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name
> (TEXT)}
> => in schema: {(4) default.n.n_regionkey (INT4),
> default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.r.r_regionkey
> (INT4)}
> SCAN(2) on default.region as r
> => target list: default.r.r_regionkey (INT4)
> => out schema: {(1) default.r.r_regionkey (INT4)}
> => in schema: {(3) default.r.r_regionkey (INT4),
> default.r.r_name (TEXT), default.r.r_comment (TEXT)}
> SCAN(1) on default.nation as n
> => target list: default.n.n_regionkey (INT4),
> default.n.n_nationkey (INT4), default.n.n_name (TEXT)
> => out schema: {(3) default.n.n_regionkey (INT4),
> default.n.n_nationkey (INT4), default.n.n_name (TEXT)}
> => in schema: {(4) default.n.n_nationkey (INT4),
> default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.n.n_comment
> (TEXT)}
> SCAN(0) on default.nation as n
> => target list: default.n.n_nationkey (INT4), default.n.n_name
> (TEXT), default.n.n_regionkey (INT4)
> => out schema: {(3) default.n.n_nationkey (INT4), default.n.n_name
> (TEXT), default.n.n_regionkey (INT4)}
> => in schema: {(4) default.n.n_nationkey (INT4), default.n.n_name
> (TEXT), default.n.n_regionkey (INT4), default.n.n_comment (TEXT)}
> ...
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)