[
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941726#comment-15941726
]
ASF GitHub Bot commented on DRILL-5375:
---------------------------------------
Github user arina-ielchiieva commented on a diff in the pull request:
https://github.com/apache/drill/pull/794#discussion_r108035986
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PlannerSettings.java
---
@@ -105,6 +103,29 @@
public static final PositiveLongValidator
PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD = new
PositiveLongValidator(PARQUET_ROWGROUP_FILTER_PUSHDOWN_PLANNING_THRESHOLD_KEY,
Long.MAX_VALUE, 10000);
+ /*
+ Enables rules that re-write query joins in the most optimal way.
+ Though its turned on be default and its value in query optimization
is undeniable, user may want turn off such
+ optimization to leave join order indicated in sql query unchanged.
+
+ For example:
+ Currently only nested loop join allows non-equi join conditions
usage.
+ During planning stage nested loop join will be chosen when non-equi
join is detected
+ and {@link #NLJOIN_FOR_SCALAR} set to false. Though query
performance may not be the most optimal in such case,
+ user may use such workaround to execute queries with non-equi joins.
+
+ Nested loop join allows only INNER and LEFT join usage and implies
that right input is smaller that left input.
+ During LEFT join when join optimization is enabled and detected that
right input is larger that left,
+ join will be optimized: left and right inputs will be flipped and
LEFT join type will be changed to RIGHT one.
+ If query contains non-equi joins, after such optimization it will
fail, since nested loop does not allow
+ RIGHT join. In this case if user accepts probability of non optimal
performance, he may turn off join optimization.
+ Turning off join optimization, makes sense only if user are not sure
that right output is less or equal to left,
+ otherwise join optimization can be left turned on.
+
+ Note: once hash and merge joins will allow non-equi join conditions,
+ the need to turn off join optimization may go away.
+ */
+ public static final BooleanValidator JOIN_OPTIMIZATION = new
BooleanValidator("planner.enable_join_optimization", true);
--- End diff --
JOIN_OPTIMIZATION enables two rules `DRILL_JOIN_TO_MULTIJOIN_RULE` and
`DRILL_LOPT_OPTIMIZE_JOIN_RULE` which are applicable for any types of joins.
That's why naming is quite broad, I believe these two rules are not only
responsible for join swap but for all other join optimization techniques. In
our use case, user may want to disable them when he doesn't won't join swap to
be performed but there may other reasons. Though as I have noted, when we
implement non-equality joins for hash and merge joins, we may remove this
configuration parameter.
> Nested loop join: return correct result for left join
> -----------------------------------------------------
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
> FYQ varchar(999),
> dts varchar(999),
> dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
> who varchar(999),
> event varchar(999),
> dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +------------+---------+---------+-------------------+
> | dt | fyq | who | event |
> +------------+---------+---------+-------------------+
> | 2016-01-01 | NULL | aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> +------------+---------+---------+-------------------+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-------------+----------+----------+--------------------+
> | dt | fyq | who | event |
> +-------------+----------+----------+--------------------+
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> +-------------+----------+----------+--------------------+
> 3 rows selected (2.523 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)