Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1059#discussion_r158593956
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
 ---
    @@ -305,11 +307,15 @@ public void executeBuildPhase() throws 
SchemaChangeException, ClassTransformatio
         //Setup the underlying hash table
     
         // skip first batch if count is zero, as it may be an empty schema 
batch
    -    if (right.getRecordCount() == 0) {
    +    if (isFurtherProcessingRequired(rightUpstream) && 
right.getRecordCount() == 0) {
           for (final VectorWrapper<?> w : right) {
             w.clear();
           }
           rightUpstream = next(right);
    +      if (isFurtherProcessingRequired(rightUpstream) &&
    +          right.getRecordCount() > 0 && hashTable == null) {
    +        setupHashTable();
    --- End diff --
    
    This handles an empty batch followed by a non-empty batch. Can we be sure 
that there will only ever be a sequence of 0 or 1 empty batches? Might there be 
a pathological scan that reads 20 (say) empty files, producing a series of 20 
empty batches? In short, should the logic here be in a loop?
    
    Did we create a test that checks for this case?


---

Reply via email to