[GitHub] drill pull request #794: DRILL-5375: Nested loop join: return correct result...

amansinha100 Fri, 31 Mar 2017 09:21:16 -0700

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/794#discussion_r109193083
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinBatch.java
 ---
    @@ -214,26 +226,62 @@ private boolean hasMore(IterOutcome outcome) {
     
       /**
        * Method generates the runtime code needed for NLJ. Other than the 
setup method to set the input and output value
    -   * vector references we implement two more methods
    -   * 1. emitLeft()  -> Project record from the left side
    -   * 2. emitRight() -> Project record from the right side (which is a 
hyper container)
    +   * vector references we implement three more methods
    +   * 1. doEval() -> Evaluates if record from left side matches record from 
the right side
    +   * 2. emitLeft() -> Project record from the left side
    +   * 3. emitRight() -> Project record from the right side (which is a 
hyper container)
        * @return the runtime generated class that implements the 
NestedLoopJoin interface
    -   * @throws IOException
    -   * @throws ClassTransformationException
        */
    -  private NestedLoopJoin setupWorker() throws IOException, 
ClassTransformationException {
    -    final CodeGenerator<NestedLoopJoin> nLJCodeGenerator = 
CodeGenerator.get(NestedLoopJoin.TEMPLATE_DEFINITION, 
context.getFunctionRegistry(), context.getOptions());
    +  private NestedLoopJoin setupWorker() throws IOException, 
ClassTransformationException, SchemaChangeException {
    +    final CodeGenerator<NestedLoopJoin> nLJCodeGenerator = 
CodeGenerator.get(
    +        NestedLoopJoin.TEMPLATE_DEFINITION, context.getFunctionRegistry(), 
context.getOptions());
         nLJCodeGenerator.plainJavaCapable(true);
         // Uncomment out this line to debug the generated code.
     //    nLJCodeGenerator.saveCodeForDebugging(true);
         final ClassGenerator<NestedLoopJoin> nLJClassGenerator = 
nLJCodeGenerator.getRoot();
     
    +    // generate doEval
    +    final ErrorCollector collector = new ErrorCollectorImpl();
    +
    +
    +    /*
    +        Logical expression may contain fields from left and right batches. 
During code generation (materialization)
    +        we need to indicate from which input field should be taken. 
Mapping sets can work with only one input at a time.
    +        But non-equality expressions can be complex:
    +          select t1.c1, t2.c1, t2.c2 from t1 inner join t2 on t1.c1 
between t2.c1 and t2.c2
    +        or even contain self join which can not be transformed into filter 
since OR clause is present
    +          select *from t1 inner join t2 on t1.c1 >= t2.c1 or t1.c3 <> t1.c4
    +
    +        In this case logical expression can not be split according to 
input presence (like during equality joins
    --- End diff --
    
    To avoid confusion you could list couple of example categories:  
    1. Join on non-equijoin predicates:  t1 inner join t2 on  (t1.c1 between 
t2.c1 AND t2.c2) AND (...) 
    2. Join with an OR predicate: t1 inner join t2 on on t1.c1 = t2.c1 OR t1.c2 
= t2.c2
    
    The other category where a join predicate includes self-join could probably 
be left out since there are quite a few variations there - if there are 2 
tables but the join condition only specifies 1 table, then it would be a 
cartesian join with the second table. If the self join occurs in combination 
with an AND it would be treated differently compared with OR etc..



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #794: DRILL-5375: Nested loop join: return correct result...

Reply via email to