Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1228#discussion_r184264961
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) {
        */
       private int netRowWidth;
       private int netRowWidthCap50;
    +
    +  /**
    +   * actual row size if input is not empty. Otherwise, standard size.
    +   */
    +  private int rowAllocSize;
    --- End diff --
    
    I see. In this case, however, arrays (repeated values) will be empty. If we 
have 10 such rows, there is no reason to have 50 "inner" values. Also, for 
VarChar, no values will be stored; all columns will be null. (If we are 
handling non-null columns, then the non-null VarChar will be an empty string.)
    
    So, we probably need a bit of a special case: prepare data for a run of 
null rows (with arrays and VarChar of length 0) vs. take our best guess with no 
knowledge at all about lengths (which may be non-empty.)
    
    Probably not a huge issue if you only need to handle a single row. But, 
creating a batch with only one row will cause all kinds of performance issues 
downstream. (I found that out the hard way when a but in sort produced a series 
of one-row batches...)


---

Reply via email to