[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

ilooner Tue, 06 Feb 2018 11:13:31 -0800

Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r166411194
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
    @@ -733,28 +780,32 @@ private void restoreReservedMemory() {
        * @param records
        */
       private void allocateOutgoing(int records) {
    -    // Skip the keys and only allocate for outputting the workspace values
    -    // (keys will be output through splitAndTransfer)
    -    Iterator<VectorWrapper<?>> outgoingIter = outContainer.iterator();
    -    for (int i = 0; i < numGroupByOutFields; i++) {
    -      outgoingIter.next();
    -    }
    -
         // try to preempt an OOM by using the reserved memory
         useReservedOutgoingMemory();
         long allocatedBefore = allocator.getAllocatedMemory();
     
    -    while (outgoingIter.hasNext()) {
    +    for (int columnIndex = numGroupByOutFields; columnIndex < 
outContainer.getNumberOfColumns(); columnIndex++) {
    +      final VectorWrapper wrapper = 
outContainer.getValueVector(columnIndex);
           @SuppressWarnings("resource")
    -      ValueVector vv = outgoingIter.next().getValueVector();
    +      final ValueVector vv = wrapper.getValueVector();
     
    -      AllocationHelper.allocatePrecomputedChildCount(vv, records, 
maxColumnWidth, 0);
    +      final RecordBatchSizer.ColumnSize columnSizer = new 
RecordBatchSizer.ColumnSize(wrapper.getValueVector());
    +      int columnSize;
    +
    +      if (columnSizer.hasKnownSize()) {
    +        // For fixed width vectors we know the size of each record
    +        columnSize = columnSizer.getKnownSize();
    +      } else {
    +        // For var chars we need to use the input estimate
    +        columnSize = varcharValueSizes.get(columnIndex);
    +      }
    +
    +      AllocationHelper.allocatePrecomputedChildCount(vv, records, 
columnSize, 0);
    --- End diff --
    
    Hmm. The element count from the sizer would tell us the number of rows in 
the incoming batch. But that will not give us an accurate prediction for the 
number of keys we have aggregations for. Maybe we should use the number of keys 
in the hashtable instead.

---

[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

Reply via email to