Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/1101#discussion_r165156589 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -215,6 +206,7 @@ public BatchHolder() { MaterializedField outputField = materializedValueFields[i]; // Create a type-specific ValueVector for this value vector = TypeHelper.getNewVector(outputField, allocator); + int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize; --- End diff -- @ilooner That is the point. If we know the exact value, why do we need RecordBatchSizer ? we should use RecordBatchSizer when we need to get sizing information for a batch (in most cases, incoming batch). In this case, you are allocating memory for value vectors for the batch you are building. For fixed width columns, you can get the column width size for each type you are allocating memory for using TypeHelper.getSize. For variable width columns, TypeHelper.getSize assumes it is 50 bytes. If you want to adjust memory you are allocating for variable width columns for outgoing batch based on incoming batch, that's when you use RecordBatchSizer on actual incoming batch to figure out the average size of that column. You can also use RecordBatchSizer on incoming batch if you want to figure out how many values you want to allocate memory for in the outgoing batch. Note that, with your change, for just created value vectors, variable width columns will return estSize of 1, which is n ot what you want.
---