Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1228#discussion_r184244877
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) {
        */
       private int netRowWidth;
       private int netRowWidthCap50;
    +
    +  /**
    +   * actual row size if input is not empty. Otherwise, standard size.
    +   */
    +  private int rowAllocSize;
    --- End diff --
    
    Unless I'm missing something, we can't move forward on a join if one side 
is empty: we won't know if we have the rows we need. Consider a merge join 
(simplest). The left gets some data, but the right is empty. We can't proceed 
unless the right hit EOF. Otherwise, we don't know if we have a match or not 
for the first left row.
    
    We need to read another right batch and keep going until we either hit EOF 
(no matching rows) or get some data.
    
    Once we have some data, we can go row-by-row to see if we have a left-only, 
right-only, or matching set of rows. If we get to EOF on either side, we know 
that their are no matches for the other side.
    
    What we do in the no-match case depends on whether we are doing LEFT OUTER, 
RIGHT OUTER or an INNER join.
    
    The point is, we can't make progress until we get that non-empty right 
batch (in this example). So, no reason to allocate space based on an empty 
batch (unless the entire input is empty) because we'll need to find a non-empty 
(or EOF) batch anyway.


---

Reply via email to