[ 
https://issues.apache.org/jira/browse/TAJO-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317413#comment-14317413
 ] 

ASF GitHub Bot commented on TAJO-1277:
--------------------------------------

Github user sirpkt commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/379#discussion_r24553012
  
    --- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
 ---
    @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, 
LogicalPlan.QueryBlock blo
         JoinEdge bestPair;
     
         while (remainRelations.size() > 1) {
    +      Set<LogicalNode> checkingRelations = new 
LinkedHashSet<LogicalNode>();
    +
    +      for (LogicalNode relation : remainRelations) {
    +        Collection <String> relationStrings = 
PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
    +        List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
    +        String relationCollection = 
TUtil.collectionToString(relationStrings, ",");
    +        List<JoinEdge> joinEdgesForGiven = 
joinGraph.getIncomingEdges(relationCollection);
    +        if (joinEdgesForGiven != null) {
    +          joinEdges.addAll(joinEdgesForGiven);
    +        }
    +        for (String relationString: relationStrings) {
    --- End diff --
    
    Oh, it's my mistake.
    When relationStrings has only one entry, this code may adds that entry 
twice.
    
    When a LogicalNode contains two relations, for example, A and B,
    above code first finds joinEdges whose right relation is "A, B", which is 
obtained by TUtil.collectionToString().
    Next, it finds joinEdges whose right relation is "A" or "B", which is 
obtained by 'for (String relationString: relationStrings)'.
    So, if a LogicalNode contains just one relation, this code may adds that 
relation repeatedly.
    
    Duplicated relation does not affect the result but I'll patch not to have 
duplicated relations.


> GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of 
> joins
> ----------------------------------------------------------------------------------
>
>                 Key: TAJO-1277
>                 URL: https://issues.apache.org/jira/browse/TAJO-1277
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Keuntae Park
>            Assignee: Keuntae Park
>
> It looks like GreedyHeuristicJoinOrderAlgorithm always assumes every joins 
> are associative.
> Following query returns in inaccurate result:
> {code}
> select * FROM
> customer c 
> right outer join nation n on c.c_custkey = n.n_nationkey
> join region r on c.c_custkey = r.r_regionkey;
> {code}
> because GreedyHeuristicJoinOrderAlgorithm changes join order as
> {code}
> select * FROM
> customer c 
> join region r on c.c_custkey = r.r_regionkey
> right outer join nation n on c.c_custkey = n.n_nationkey;
> {code}
> I think getBestPair() should be fixed to avoid wrong join ordering. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to