[ 
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951207#comment-15951207
 ] 

ASF GitHub Bot commented on DRILL-5375:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/794#discussion_r109197541
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
 ---
    @@ -105,6 +103,29 @@
       public static final PositiveLongValidator 
PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD = new 
PositiveLongValidator(PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD_KEY,
           Long.MAX_VALUE, 10000);
     
    +  /*
    +     Enables rules that re-write query joins in the most optimal way.
    +     Though its turned on be default and its value in query optimization 
is undeniable, user may want turn off such
    +     optimization to leave join order indicated in sql query unchanged.
    +
    +      For example:
    +      Currently only nested loop join allows non-equi join conditions 
usage.
    +      During planning stage nested loop join will be chosen when non-equi 
join is detected
    +      and {@link #NLJOIN_FOR_SCALAR} set to false. Though query 
performance may not be the most optimal in such case,
    +      user may use such workaround to execute queries with non-equi joins.
    +
    +      Nested loop join allows only INNER and LEFT join usage and implies 
that right input is smaller that left input.
    +      During LEFT join when join optimization is enabled and detected that 
right input is larger that left,
    +      join will be optimized: left and right inputs will be flipped and 
LEFT join type will be changed to RIGHT one.
    +      If query contains non-equi joins, after such optimization it will 
fail, since nested loop does not allow
    +      RIGHT join. In this case if user accepts probability of non optimal 
performance, he may turn off join optimization.
    +      Turning off join optimization, makes sense only if user are not sure 
that right output is less or equal to left,
    +      otherwise join optimization can be left turned on.
    +
    +      Note: once hash and merge joins will allow non-equi join conditions,
    +      the need to turn off join optimization may go away.
    +   */
    +  public static final BooleanValidator JOIN_OPTIMIZATION = new 
BooleanValidator("planner.enable_join_optimization", true);
    --- End diff --
    
    Ah, you added this option to enable/disable the *logical* join rules.  
Since NestedLoopJoin is a physical join implementation, from the comments I 
interpreted that this was intended for the swapping of left and right inputs of 
the (physical) NL join, which is why I mentioned about hashjoin_swap option.   
It seems to me that if there is an LEFT OUTER JOIN and condition is 
non-equality, then we should not allow changing to a Right Outer Join by 
flipping the left and right sides, since that would make the query fail.   What 
do you think ?
    I suppose we could keep your boolean option for this PR and address the 
left outer join issue separately.  


> Nested loop join: return correct result for left join
> -----------------------------------------------------
>
>                 Key: DRILL-5375
>                 URL: https://issues.apache.org/jira/browse/DRILL-5375
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>              Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
>   FYQ varchar(999),
>   dts varchar(999),
>   dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
>   who varchar(999),
>   event varchar(999),
>   dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +------------+---------+---------+-------------------+
> | dt         | fyq     | who     | event             |
> +------------+---------+---------+-------------------+
> | 2016-01-01 | NULL    | aperson | went wild         |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas     |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing      |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> +------------+---------+---------+-------------------+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-------------+----------+----------+--------------------+
> |     dt      |   fyq    |   who    |       event        |
> +-------------+----------+----------+--------------------+
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas      |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing       |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-------------+----------+----------+--------------------+
> 3 rows selected (2.523 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to