Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/794#discussion_r109193083
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java
---
@@ -214,26 +226,62 @@ private boolean hasMore(IterOutcome outcome) {
/**
* Method generates the runtime code needed for NLJ. Other than the
setup method to set the input and output value
- * vector references we implement two more methods
- * 1. emitLeft() -> Project record from the left side
- * 2. emitRight() -> Project record from the right side (which is a
hyper container)
+ * vector references we implement three more methods
+ * 1. doEval() -> Evaluates if record from left side matches record from
the right side
+ * 2. emitLeft() -> Project record from the left side
+ * 3. emitRight() -> Project record from the right side (which is a
hyper container)
* @return the runtime generated class that implements the
NestedLoopJoin interface
- * @throws IOException
- * @throws ClassTransformationException
*/
- private NestedLoopJoin setupWorker() throws IOException,
ClassTransformationException {
- final CodeGenerator<NestedLoopJoin> nLJCodeGenerator =
CodeGenerator.get(NestedLoopJoin.TEMPLATE_DEFINITION,
context.getFunctionRegistry(), context.getOptions());
+ private NestedLoopJoin setupWorker() throws IOException,
ClassTransformationException, SchemaChangeException {
+ final CodeGenerator<NestedLoopJoin> nLJCodeGenerator =
CodeGenerator.get(
+ NestedLoopJoin.TEMPLATE_DEFINITION, context.getFunctionRegistry(),
context.getOptions());
nLJCodeGenerator.plainJavaCapable(true);
// Uncomment out this line to debug the generated code.
// nLJCodeGenerator.saveCodeForDebugging(true);
final ClassGenerator<NestedLoopJoin> nLJClassGenerator =
nLJCodeGenerator.getRoot();
+ // generate doEval
+ final ErrorCollector collector = new ErrorCollectorImpl();
+
+
+ /*
+ Logical expression may contain fields from left and right batches.
During code generation (materialization)
+ we need to indicate from which input field should be taken.
Mapping sets can work with only one input at a time.
+ But non-equality expressions can be complex:
+ select t1.c1, t2.c1, t2.c2 from t1 inner join t2 on t1.c1
between t2.c1 and t2.c2
+ or even contain self join which can not be transformed into filter
since OR clause is present
+ select *from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> t1.c4
+
+ In this case logical expression can not be split according to
input presence (like during equality joins
--- End diff --
To avoid confusion you could list couple of example categories:
1. Join on non-equijoin predicates: t1 inner join t2 on (t1.c1 between
t2.c1 AND t2.c2) AND (...)
2. Join with an OR predicate: t1 inner join t2 on on t1.c1 = t2.c1 OR t1.c2
= t2.c2
The other category where a join predicate includes self-join could probably
be left out since there are quite a few variations there - if there are 2
tables but the join condition only specifies 1 table, then it would be a
cartesian join with the second table. If the self join occurs in combination
with an AND it would be treated differently compared with OR etc..
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---