[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

ASF GitHub Bot (JIRA) Tue, 06 Feb 2018 11:14:55 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354394#comment-16354394
 ]


ASF GitHub Bot commented on DRILL-6032:
---------------------------------------

Github user ilooner commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r166411194
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
    @@ -733,28 +780,32 @@ private void restoreReservedMemory() {
        * @param records
        */
       private void allocateOutgoing(int records) {
    -    // Skip the keys and only allocate for outputting the workspace values
    -    // (keys will be output through splitAndTransfer)
    -    Iterator<VectorWrapper<?>> outgoingIter = outContainer.iterator();
    -    for (int i = 0; i < numGroupByOutFields; i++) {
    -      outgoingIter.next();
    -    }
    -
         // try to preempt an OOM by using the reserved memory
         useReservedOutgoingMemory();
         long allocatedBefore = allocator.getAllocatedMemory();
     
    -    while (outgoingIter.hasNext()) {
    +    for (int columnIndex = numGroupByOutFields; columnIndex < 
outContainer.getNumberOfColumns(); columnIndex++) {
    +      final VectorWrapper wrapper = 
outContainer.getValueVector(columnIndex);
           @SuppressWarnings("resource")
    -      ValueVector vv = outgoingIter.next().getValueVector();
    +      final ValueVector vv = wrapper.getValueVector();
     
    -      AllocationHelper.allocatePrecomputedChildCount(vv, records, 
maxColumnWidth, 0);
    +      final RecordBatchSizer.ColumnSize columnSizer = new 
RecordBatchSizer.ColumnSize(wrapper.getValueVector());
    +      int columnSize;
    +
    +      if (columnSizer.hasKnownSize()) {
    +        // For fixed width vectors we know the size of each record
    +        columnSize = columnSizer.getKnownSize();
    +      } else {
    +        // For var chars we need to use the input estimate
    +        columnSize = varcharValueSizes.get(columnIndex);
    +      }
    +
    +      AllocationHelper.allocatePrecomputedChildCount(vv, records, 
columnSize, 0);
    --- End diff --
    
    Hmm. The element count from the sizer would tell us the number of rows in 
the incoming batch. But that will not give us an accurate prediction for the 
number of keys we have aggregations for. Maybe we should use the number of keys 
in the hashtable instead.


> Use RecordBatchSizer to estimate size of columns in HashAgg
> -----------------------------------------------------------
>
>                 Key: DRILL-6032
>                 URL: https://issues.apache.org/jira/browse/DRILL-6032
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

Reply via email to