[
https://issues.apache.org/jira/browse/TAJO-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313661#comment-14313661
]
ASF GitHub Bot commented on TAJO-1277:
--------------------------------------
GitHub user sirpkt opened a pull request:
https://github.com/apache/tajo/pull/379
TAJO-1277: GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes
associativity of joins
Basically, it limits the range of join ordering until it meets outer join
operations.
For example, in the case of (((((a inner join b) inner join c) outer join
d) inner join e) inner join f),
join ordering is partitioned as three parts as
1) (a inner join b) inner join c
2) (result of 1) outer join d
3) (((result of 2) inner join e) inner join f)
Following modifications are included:
- findBestOrder() is changed to partition join ordering
- getBestPair() and findJoin() are changed to return the corresponding
JoinEdges of the selected join because those JoinEdges should be removed before
next join ordering
It passes 'mvn clean install' and several join cases I tested,
however, I'm not sure this is good approach.
Please, leave me comments about the patch.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sirpkt/tajo TAJO-1277
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/379.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #379
----
commit 74802e228dd3af8e885cc724c32d5746939c2c23
Author: Keuntae Park <[email protected]>
Date: 2015-02-09T09:21:04Z
join optimizer is enhanced to distinguish non-associative join cases
----
> GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of
> joins
> ----------------------------------------------------------------------------------
>
> Key: TAJO-1277
> URL: https://issues.apache.org/jira/browse/TAJO-1277
> Project: Tajo
> Issue Type: Bug
> Reporter: Keuntae Park
>
> It looks like GreedyHeuristicJoinOrderAlgorithm always assumes every joins
> are associative.
> Following query returns in inaccurate result:
> {code}
> select * FROM
> customer c
> right outer join nation n on c.c_custkey = n.n_nationkey
> join region r on c.c_custkey = r.r_regionkey;
> {code}
> because GreedyHeuristicJoinOrderAlgorithm changes join order as
> {code}
> select * FROM
> customer c
> join region r on c.c_custkey = r.r_regionkey
> right outer join nation n on c.c_custkey = n.n_nationkey;
> {code}
> I think getBestPair() should be fixed to avoid wrong join ordering.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)