[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

ilooner Mon, 29 Jan 2018 16:47:05 -0800

Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r164611155
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
    @@ -215,6 +206,7 @@ public BatchHolder() {
               MaterializedField outputField = materializedValueFields[i];
               // Create a type-specific ValueVector for this value
               vector = TypeHelper.getNewVector(outputField, allocator);
    +          int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
    --- End diff --
    
    The goal of this code is to preallocate enough space to hold BATCH_SIZE 
number of elements in the batch. The BatchHolder itself is used to hold the 
aggregate values and aggregate values can currently only be FixedWidth vectors 
or ObjectVectors. Since we know how much direct memory each of these types will 
consume, we can use that knowledge in our column size estimate and preallocate 
the correct amount of space.
    
    **Note:** earlier the RecordBatchSizer would return an estSize of 0 for an 
empty FixedWidth value vectors. For example and empty IntVector would return an 
estSize of 0. This was incorrect behavior so I updated the RecordBatchSizer to 
return the correct size of a FixedWidth vector even when the value vector is 
empty. Please see the changes in the RecordBatchSizer and ValueVector templates 
for more details.

---

[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

Reply via email to