[ 
https://issues.apache.org/jira/browse/DRILL-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360012#comment-16360012
 ] 

ASF GitHub Bot commented on DRILL-6089:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1117#discussion_r167441417
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
 ---
    @@ -197,4 +198,24 @@ public void emptyPartTest() throws Exception {
           BaseTestQuery.resetSessionOption(ExecConstants.SLICE_TARGET);
         }
       }
    +
    +  @Test // DRILL-6089
    +  public void testJoinOrdering() throws Exception {
    +    final String query = "select * from dfs.`sample-data/nation.parquet` 
nation left outer join " +
    +      "(select * from dfs.`sample-data/region.parquet`) " +
    +      "as region on region.r_regionkey = nation.n_nationkey order by 
region.r_name desc";
    +    final String plan = getPlanInString("EXPLAIN PLAN for " + 
QueryTestUtil.normalizeQuery(query), OPTIQ_FORMAT);
    +    lastSortAfterJoin(plan);
    --- End diff --
    
    Most plan tests that we have use one of the utility methods in PlanTestBase 
(from which JoinTestBase is derived) which uses Java's regex Pattern and 
Matcher classes.  In your query, is it necessary to check the index of the Sort 
vs the HashJoin ?  Since there is expected to be only 1 Sort (corresponding to 
the final ORDER BY), as long as there is a regex pattern that matches  
`'*Sort*HashJoin',` I think that would be sufficient.  You might want to see 
the callers of [1] if it satisfies your requirement. 
    
    [1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/PlanTestBase.java#L82


> Validate That Planner Does Not Assume HashJoin Preserves Ordering for FS, 
> MaprDB, or Hive
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-6089
>                 URL: https://issues.apache.org/jira/browse/DRILL-6089
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
>
>
> Explanation provided by Boaz:
> (As explained in the design document) The new "automatic spill" feature of 
> the Hash-Join operator may cause (if spilling occurs) the rows from the 
> left/probe side to be returned in a different order than their incoming order 
> (due to splitting the rows into partitions).
> Currently the Drill planner assumes that left-order is preserved by the 
> Hash-Join operator; therefore if not changes, a query relying on that order 
> may return wrong results (when the Hash-Join spills).
> A fix is needed. Here are few options (ordered from the simpler down to the 
> most complex):
>  # Change the order rule in the planner. Thus whenever an order is needed 
> above (downstream) the Hash-Join, the planner would add a sort operator. That 
> would be a big execution time waste.
>  # When the planner needs the left-order above the Hash-Join, it may assess 
> the size of the right/build side (need statistics). If the right side is 
> small enough, the planner would set an option for the runtime to avoid 
> spilling, hence preserving the left-side order. In case spilling becomes 
> necessary, the code would return an error (possibly with a message suggesting 
> setting some special option and retrying; the special option would add a sort 
> operator and allow the hash-join to spill).
>  # When generating the code for the fragment above the Hash-Join (where 
> left-order should be maintained) - at code-gen time check if the hash-join 
> below spilled, and if so, add a sort operator. (Nothing like that exists in 
> Drill now, so it may be complicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to