Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r166411194
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
    @@ -733,28 +780,32 @@ private void restoreReservedMemory() {
        * @param records
        */
       private void allocateOutgoing(int records) {
    -    // Skip the keys and only allocate for outputting the workspace values
    -    // (keys will be output through splitAndTransfer)
    -    Iterator<VectorWrapper<?>> outgoingIter = outContainer.iterator();
    -    for (int i = 0; i < numGroupByOutFields; i++) {
    -      outgoingIter.next();
    -    }
    -
         // try to preempt an OOM by using the reserved memory
         useReservedOutgoingMemory();
         long allocatedBefore = allocator.getAllocatedMemory();
     
    -    while (outgoingIter.hasNext()) {
    +    for (int columnIndex = numGroupByOutFields; columnIndex < 
outContainer.getNumberOfColumns(); columnIndex++) {
    +      final VectorWrapper wrapper = 
outContainer.getValueVector(columnIndex);
           @SuppressWarnings("resource")
    -      ValueVector vv = outgoingIter.next().getValueVector();
    +      final ValueVector vv = wrapper.getValueVector();
     
    -      AllocationHelper.allocatePrecomputedChildCount(vv, records, 
maxColumnWidth, 0);
    +      final RecordBatchSizer.ColumnSize columnSizer = new 
RecordBatchSizer.ColumnSize(wrapper.getValueVector());
    +      int columnSize;
    +
    +      if (columnSizer.hasKnownSize()) {
    +        // For fixed width vectors we know the size of each record
    +        columnSize = columnSizer.getKnownSize();
    +      } else {
    +        // For var chars we need to use the input estimate
    +        columnSize = varcharValueSizes.get(columnIndex);
    +      }
    +
    +      AllocationHelper.allocatePrecomputedChildCount(vv, records, 
columnSize, 0);
    --- End diff --
    
    Hmm. The element count from the sizer would tell us the number of rows in 
the incoming batch. But that will not give us an accurate prediction for the 
number of keys we have aggregations for. Maybe we should use the number of keys 
in the hashtable instead.


---

Reply via email to