[
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941003#comment-15941003
]
ASF GitHub Bot commented on DRILL-5375:
---------------------------------------
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/794#discussion_r107960020
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
---
@@ -105,6 +103,29 @@
public static final PositiveLongValidator
PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD = new
PositiveLongValidator(PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD_KEY,
Long.MAX_VALUE, 10000);
+ /*
+ Enables rules that re-write query joins in the most optimal way.
+ Though its turned on be default and its value in query optimization
is undeniable, user may want turn off such
+ optimization to leave join order indicated in sql query unchanged.
+
+ For example:
+ Currently only nested loop join allows non-equi join conditions
usage.
+ During planning stage nested loop join will be chosen when non-equi
join is detected
+ and {@link #NLJOIN_FOR_SCALAR} set to false. Though query
performance may not be the most optimal in such case,
+ user may use such workaround to execute queries with non-equi joins.
+
+ Nested loop join allows only INNER and LEFT join usage and implies
that right input is smaller that left input.
+ During LEFT join when join optimization is enabled and detected that
right input is larger that left,
+ join will be optimized: left and right inputs will be flipped and
LEFT join type will be changed to RIGHT one.
+ If query contains non-equi joins, after such optimization it will
fail, since nested loop does not allow
+ RIGHT join. In this case if user accepts probability of non optimal
performance, he may turn off join optimization.
+ Turning off join optimization, makes sense only if user are not sure
that right output is less or equal to left,
+ otherwise join optimization can be left turned on.
+
+ Note: once hash and merge joins will allow non-equi join conditions,
+ the need to turn off join optimization may go away.
+ */
+ public static final BooleanValidator JOIN_OPTIMIZATION = new
BooleanValidator("planner.enable_join_optimization", true);
--- End diff --
The name 'join_optimization' is misleading since that term is quite broad
and includes both ordering of multiple joins and the left vs right inputs of a
join. Here we are talking about the latter only and for a specific join type.
For HashJoins, we have a planner option 'enable_hashjoin_swap'. It sounds to
me that your new option is targeted for the same thing for nested loop join.
Would be better to call it 'enable_nljoin_swap'. Does that convey the intent
you want or is there something more ?
> Nested loop join: return correct result for left join
> -----------------------------------------------------
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
> FYQ varchar(999),
> dts varchar(999),
> dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
> who varchar(999),
> event varchar(999),
> dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +------------+---------+---------+-------------------+
> | dt | fyq | who | event |
> +------------+---------+---------+-------------------+
> | 2016-01-01 | NULL | aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> +------------+---------+---------+-------------------+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-------------+----------+----------+--------------------+
> | dt | fyq | who | event |
> +-------------+----------+----------+--------------------+
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> +-------------+----------+----------+--------------------+
> 3 rows selected (2.523 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)