[ https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951207#comment-15951207 ]
ASF GitHub Bot commented on DRILL-5375: --------------------------------------- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/794#discussion_r109197541 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java --- @@ -105,6 +103,29 @@ public static final PositiveLongValidator PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD = new PositiveLongValidator(PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD_KEY, Long.MAX_VALUE, 10000); + /* + Enables rules that re-write query joins in the most optimal way. + Though its turned on be default and its value in query optimization is undeniable, user may want turn off such + optimization to leave join order indicated in sql query unchanged. + + For example: + Currently only nested loop join allows non-equi join conditions usage. + During planning stage nested loop join will be chosen when non-equi join is detected + and {@link #NLJOIN_FOR_SCALAR} set to false. Though query performance may not be the most optimal in such case, + user may use such workaround to execute queries with non-equi joins. + + Nested loop join allows only INNER and LEFT join usage and implies that right input is smaller that left input. + During LEFT join when join optimization is enabled and detected that right input is larger that left, + join will be optimized: left and right inputs will be flipped and LEFT join type will be changed to RIGHT one. + If query contains non-equi joins, after such optimization it will fail, since nested loop does not allow + RIGHT join. In this case if user accepts probability of non optimal performance, he may turn off join optimization. + Turning off join optimization, makes sense only if user are not sure that right output is less or equal to left, + otherwise join optimization can be left turned on. + + Note: once hash and merge joins will allow non-equi join conditions, + the need to turn off join optimization may go away. + */ + public static final BooleanValidator JOIN_OPTIMIZATION = new BooleanValidator("planner.enable_join_optimization", true); --- End diff -- Ah, you added this option to enable/disable the *logical* join rules. Since NestedLoopJoin is a physical join implementation, from the comments I interpreted that this was intended for the swapping of left and right inputs of the (physical) NL join, which is why I mentioned about hashjoin_swap option. It seems to me that if there is an LEFT OUTER JOIN and condition is non-equality, then we should not allow changing to a Right Outer Join by flipping the left and right sides, since that would make the query fail. What do you think ? I suppose we could keep your boolean option for this PR and address the left outer join issue separately. > Nested loop join: return correct result for left join > ----------------------------------------------------- > > Key: DRILL-5375 > URL: https://issues.apache.org/jira/browse/DRILL-5375 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Arina Ielchiieva > Assignee: Arina Ielchiieva > Labels: doc-impacting > > Mini repro: > 1. Create 2 Hive tables with data > {code} > CREATE TABLE t1 ( > FYQ varchar(999), > dts varchar(999), > dte varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > 2016-Q1,2016-06-01,2016-09-30 > 2016-Q2,2016-09-01,2016-12-31 > 2016-Q3,2017-01-01,2017-03-31 > 2016-Q4,2017-04-01,2017-06-30 > CREATE TABLE t2 ( > who varchar(999), > event varchar(999), > dt varchar(999) > ) > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; > aperson,did somthing,2017-01-06 > aperson,did somthing else,2017-01-12 > aperson,had chrsitmas,2016-12-26 > aperson,went wild,2016-01-01 > {code} > 2. Impala Query shows correct result > {code} > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > +------------+---------+---------+-------------------+ > | dt | fyq | who | event | > +------------+---------+---------+-------------------+ > | 2016-01-01 | NULL | aperson | went wild | > | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas | > | 2017-01-06 | 2016-Q3 | aperson | did somthing | > | 2017-01-12 | 2016-Q3 | aperson | did somthing else | > +------------+---------+---------+-------------------+ > {code} > 3. Drill query shows wrong results: > {code} > alter session set planner.enable_nljoin_for_scalar_only=false; > use hive; > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > +-------------+----------+----------+--------------------+ > | dt | fyq | who | event | > +-------------+----------+----------+--------------------+ > | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas | > | 2017-01-06 | 2016-Q3 | aperson | did somthing | > | 2017-01-12 | 2016-Q3 | aperson | did somthing else | > +-------------+----------+----------+--------------------+ > 3 rows selected (2.523 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)