[ 
https://issues.apache.org/jira/browse/TAJO-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313736#comment-14313736
 ] 

ASF GitHub Bot commented on TAJO-1277:
--------------------------------------

Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/379#discussion_r24393231
  
    --- Diff: 
tajo-plan/src/main/java/org/apache/tajo/plan/joinorder/GreedyHeuristicJoinOrderAlgorithm.java
 ---
    @@ -57,17 +54,69 @@ public FoundJoinOrder findBestOrder(LogicalPlan plan, 
LogicalPlan.QueryBlock blo
         JoinEdge bestPair;
     
         while (remainRelations.size() > 1) {
    +      Set<LogicalNode> checkingRelations = new 
LinkedHashSet<LogicalNode>();
    +
    +      for (LogicalNode relation : remainRelations) {
    +        Collection <String> relationStrings = 
PlannerUtil.getRelationLineageWithinQueryBlock(plan, relation);
    +        List<JoinEdge> joinEdges = new ArrayList<JoinEdge>();
    +        String relationCollection = 
TUtil.collectionToString(relationStrings, ",");
    +        List<JoinEdge> joinEdgesForGiven = 
joinGraph.getIncomingEdges(relationCollection);
    +        if (joinEdgesForGiven != null) {
    +          joinEdges.addAll(joinEdgesForGiven);
    +        }
    +        for (String relationString: relationStrings) {
    +          joinEdgesForGiven = joinGraph.getIncomingEdges(relationString);
    +          if (joinEdgesForGiven != null) {
    +            joinEdges.addAll(joinEdgesForGiven);
    +          }
    +        }
    +
    +        // check if the relation is the last piece of outer join
    +        boolean endInnerRelation = false;
    +        for (JoinEdge joinEdge: joinEdges) {
    --- End diff --
    
    Here, joinEdges seem to be sorted in the order of their occurrence in the 
user query.
    So, I wonder how we can guarantee that joinEdges are always sorted in that 
order.


> GreedyHeuristicJoinOrderAlgorithm sometimes wrongly assumes associativity of 
> joins
> ----------------------------------------------------------------------------------
>
>                 Key: TAJO-1277
>                 URL: https://issues.apache.org/jira/browse/TAJO-1277
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Keuntae Park
>            Assignee: Keuntae Park
>
> It looks like GreedyHeuristicJoinOrderAlgorithm always assumes every joins 
> are associative.
> Following query returns in inaccurate result:
> {code}
> select * FROM
> customer c 
> right outer join nation n on c.c_custkey = n.n_nationkey
> join region r on c.c_custkey = r.r_regionkey;
> {code}
> because GreedyHeuristicJoinOrderAlgorithm changes join order as
> {code}
> select * FROM
> customer c 
> join region r on c.c_custkey = r.r_regionkey
> right outer join nation n on c.c_custkey = n.n_nationkey;
> {code}
> I think getBestPair() should be fixed to avoid wrong join ordering. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to