[jira] [Commented] (DRILL-6089) Validate That Planner Does Not Assume HashJoin Preserves Ordering for FS, MaprDB, or Hive

ASF GitHub Bot (JIRA) Mon, 12 Feb 2018 15:50:09 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361606#comment-16361606
 ]


ASF GitHub Bot commented on DRILL-6089:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1117#discussion_r167722292
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/join/TestHashJoinAdvanced.java
 ---
    @@ -197,4 +199,14 @@ public void emptyPartTest() throws Exception {
           BaseTestQuery.resetSessionOption(ExecConstants.SLICE_TARGET);
         }
       }
    +
    +  @Test // DRILL-6089
    +  public void testJoinOrdering() throws Exception {
    +    final String query = "select * from dfs.`sample-data/nation.parquet` 
nation left outer join " +
    --- End diff --
    
    I missed this in the prior review but I think the ORDER BY should be on the 
column coming from the left input of the join (assuming that nation table is 
the left input).  The reason is the original code was using 'convertedLeft' to 
propagate the collation trait from the Probe (left) side, so the test should 
match that scenario. Another thing is the you don't necessarily need the 
subquery.. you could  do  `select * from  nation.parquet nation  left outer 
join region.parquet region on ... order by nation.n_name desc`. 


> Validate That Planner Does Not Assume HashJoin Preserves Ordering for FS, 
> MaprDB, or Hive
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-6089
>                 URL: https://issues.apache.org/jira/browse/DRILL-6089
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
>
>
> Explanation provided by Boaz:
> (As explained in the design document) The new "automatic spill" feature of 
> the Hash-Join operator may cause (if spilling occurs) the rows from the 
> left/probe side to be returned in a different order than their incoming order 
> (due to splitting the rows into partitions).
> Currently the Drill planner assumes that left-order is preserved by the 
> Hash-Join operator; therefore if not changes, a query relying on that order 
> may return wrong results (when the Hash-Join spills).
> A fix is needed. Here are few options (ordered from the simpler down to the 
> most complex):
>  # Change the order rule in the planner. Thus whenever an order is needed 
> above (downstream) the Hash-Join, the planner would add a sort operator. That 
> would be a big execution time waste.
>  # When the planner needs the left-order above the Hash-Join, it may assess 
> the size of the right/build side (need statistics). If the right side is 
> small enough, the planner would set an option for the runtime to avoid 
> spilling, hence preserving the left-side order. In case spilling becomes 
> necessary, the code would return an error (possibly with a message suggesting 
> setting some special option and retrying; the special option would add a sort 
> operator and allow the hash-join to spill).
>  # When generating the code for the fragment above the Hash-Join (where 
> left-order should be maintained) - at code-gen time check if the hash-join 
> below spilled, and if so, add a sort operator. (Nothing like that exists in 
> Drill now, so it may be complicated).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6089) Validate That Planner Does Not Assume HashJoin Preserves Ordering for FS, MaprDB, or Hive

Reply via email to