[
https://issues.apache.org/jira/browse/DRILL-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941724#comment-15941724
]
ASF GitHub Bot commented on DRILL-5375:
---------------------------------------
Github user arina-ielchiieva commented on a diff in the pull request:
https://github.com/apache/drill/pull/794#discussion_r108036357
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java
---
@@ -214,26 +226,62 @@ private boolean hasMore(IterOutcome outcome) {
/**
* Method generates the runtime code needed for NLJ. Other than the
setup method to set the input and output value
- * vector references we implement two more methods
- * 1. emitLeft() -> Project record from the left side
- * 2. emitRight() -> Project record from the right side (which is a
hyper container)
+ * vector references we implement three more methods
+ * 1. doEval() -> Evaluates if record from left side matches record from
the right side
+ * 2. emitLeft() -> Project record from the left side
+ * 3. emitRight() -> Project record from the right side (which is a
hyper container)
* @return the runtime generated class that implements the
NestedLoopJoin interface
- * @throws IOException
- * @throws ClassTransformationException
*/
- private NestedLoopJoin setupWorker() throws IOException,
ClassTransformationException {
- final CodeGenerator<NestedLoopJoin> nLJCodeGenerator =
CodeGenerator.get(NestedLoopJoin.TEMPLATE_DEFINITION,
context.getFunctionRegistry(), context.getOptions());
+ private NestedLoopJoin setupWorker() throws IOException,
ClassTransformationException, SchemaChangeException {
+ final CodeGenerator<NestedLoopJoin> nLJCodeGenerator =
CodeGenerator.get(
+ NestedLoopJoin.TEMPLATE_DEFINITION, context.getFunctionRegistry(),
context.getOptions());
nLJCodeGenerator.plainJavaCapable(true);
// Uncomment out this line to debug the generated code.
// nLJCodeGenerator.saveCodeForDebugging(true);
final ClassGenerator<NestedLoopJoin> nLJClassGenerator =
nLJCodeGenerator.getRoot();
+ // generate doEval
+ final ErrorCollector collector = new ErrorCollectorImpl();
+
+
+ /*
+ Logical expression may contain fields from left and right batches.
During code generation (materialization)
+ we need to indicate from which input field should be taken.
Mapping sets can work with only one input at a time.
+ But non-equality expressions can be complex:
+ select t1.c1, t2.c1, t2.c2 from t1 inner join t2 on t1.c1
between t2.c1 and t2.c2
+ or even contain self join which can not be transformed into filter
since OR clause is present
+ select *from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> t1.c4
+
+ In this case logical expression can not be split according to
input presence (like during equality joins
--- End diff --
The thing is that inequality join is not only join that has `t1.c3 <>
t1.c4` but also the one that has `OR`.
For example, currently the following query `select * from t1 inner join t2
on t1.c1 = t2.c1 or t1.c2 = t2.c2` will fail which the following error:
`UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to
either a cartesian join or an inequality join`.
The main idea of my comment is that I don't bother if it's equality or
inequality join, I just materialize the whole expression with fields from two
inputs and to find out which input field is I add batch indication. If you want
I can remove the comment, if it's confusing.
> Nested loop join: return correct result for left join
> -----------------------------------------------------
>
> Key: DRILL-5375
> URL: https://issues.apache.org/jira/browse/DRILL-5375
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.8.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Labels: doc-impacting
>
> Mini repro:
> 1. Create 2 Hive tables with data
> {code}
> CREATE TABLE t1 (
> FYQ varchar(999),
> dts varchar(999),
> dte varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> 2016-Q1,2016-06-01,2016-09-30
> 2016-Q2,2016-09-01,2016-12-31
> 2016-Q3,2017-01-01,2017-03-31
> 2016-Q4,2017-04-01,2017-06-30
> CREATE TABLE t2 (
> who varchar(999),
> event varchar(999),
> dt varchar(999)
> )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> aperson,did somthing,2017-01-06
> aperson,did somthing else,2017-01-12
> aperson,had chrsitmas,2016-12-26
> aperson,went wild,2016-01-01
> {code}
> 2. Impala Query shows correct result
> {code}
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +------------+---------+---------+-------------------+
> | dt | fyq | who | event |
> +------------+---------+---------+-------------------+
> | 2016-01-01 | NULL | aperson | went wild |
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> +------------+---------+---------+-------------------+
> {code}
> 3. Drill query shows wrong results:
> {code}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> use hive;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> +-------------+----------+----------+--------------------+
> | dt | fyq | who | event |
> +-------------+----------+----------+--------------------+
> | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas |
> | 2017-01-06 | 2016-Q3 | aperson | did somthing |
> | 2017-01-12 | 2016-Q3 | aperson | did somthing else |
> +-------------+----------+----------+--------------------+
> 3 rows selected (2.523 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)