Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1228#discussion_r184264961 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -536,6 +556,11 @@ public ColumnSize getColumn(String name) { */ private int netRowWidth; private int netRowWidthCap50; + + /** + * actual row size if input is not empty. Otherwise, standard size. + */ + private int rowAllocSize; --- End diff -- I see. In this case, however, arrays (repeated values) will be empty. If we have 10 such rows, there is no reason to have 50 "inner" values. Also, for VarChar, no values will be stored; all columns will be null. (If we are handling non-null columns, then the non-null VarChar will be an empty string.) So, we probably need a bit of a special case: prepare data for a run of null rows (with arrays and VarChar of length 0) vs. take our best guess with no knowledge at all about lengths (which may be non-empty.) Probably not a huge issue if you only need to handle a single row. But, creating a batch with only one row will cause all kinds of performance issues downstream. (I found that out the hard way when a but in sort produced a series of one-row batches...)
---