[
https://issues.apache.org/jira/browse/TAJO-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313726#comment-14313726
]
ASF GitHub Bot commented on TAJO-1277:
--------------------------------------
Github user jihoonson commented on a diff in the pull request:
https://github.com/apache/tajo/pull/379#discussion_r24393011
--- Diff:
tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
---
@@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan,
LogicalPlan.QueryBlock blo
JoinEdge bestPair;
while (remainRelations.size() > 1) {
+ Set<LogicalNode> checkingRelations = new
LinkedHashSet<LogicalNode>();
+
+ for (LogicalNode relation : remainRelations) {
+ Collection <String> relationStrings =
PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
+ List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
+ String relationCollection =
TUtil.collectionToString(relationStrings, ",");
+ List<JoinEdge> joinEdgesForGiven =
joinGraph.getIncomingEdges(relationCollection);
+ if (joinEdgesForGiven != null) {
+ joinEdges.addAll(joinEdgesForGiven);
+ }
+ for (String relationString: relationStrings) {
+ joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
+ if (joinEdgesForGiven != null) {
+ joinEdges.addAll(joinEdgesForGiven);
+ }
+ }
+
+ // check if the relation is the last piece of outer join
+ boolean endInnerRelation = false;
+ for (JoinEdge joinEdge: joinEdges) {
+ switch(joinEdge.getJoinType()) {
+ case LEFT_OUTER:
+ case RIGHT_OUTER:
+ case FULL_OUTER:
--- End diff --
We should consider other join types such as ```anti join``` or ```semi
join``` as well as ```inner join``` and ```outer join```.
So, I think that the iteration must be stopped when finding ```anti join```
or ```semi join```.
> GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of
> joins
> ----------------------------------------------------------------------------------
>
> Key: TAJO-1277
> URL: https://issues.apache.org/jira/browse/TAJO-1277
> Project: Tajo
> Issue Type: Bug
> Reporter: Keuntae Park
> Assignee: Keuntae Park
>
> It looks like GreedyHeuristicJoinOrderAlgorithm always assumes every joins
> are associative.
> Following query returns in inaccurate result:
> {code}
> select * FROM
> customer c
> right outer join nation n on c.c_custkey = n.n_nationkey
> join region r on c.c_custkey = r.r_regionkey;
> {code}
> because GreedyHeuristicJoinOrderAlgorithm changes join order as
> {code}
> select * FROM
> customer c
> join region r on c.c_custkey = r.r_regionkey
> right outer join nation n on c.c_custkey = n.n_nationkey;
> {code}
> I think getBestPair() should be fixed to avoid wrong join ordering.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)