[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

ASF GitHub Bot (JIRA) Mon, 29 Jan 2018 18:11:23 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344397#comment-16344397
 ]


ASF GitHub Bot commented on DRILL-6032:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1101#discussion_r164623527
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
 ---
    @@ -232,9 +251,8 @@ else if (width > 0) {
         }
       }
     
    -  public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; 
// 16 MiB
    -
       private List<ColumnSize> columnSizes = new ArrayList<>();
    +  private Map<String, ColumnSize> columnSizeMap = 
CaseInsensitiveMap.newHashMap();
    --- End diff --
    
    Drill is case insensitive internally. The case insensitive map is correct. 
Thanks for catching this @ilooner!
    
    Unfortunately, record batches have no name space: they are just a 
collection of vectors. So, we could end up with columns called both "c" and 
"C". This situation will case the column size map to end up with one entry for 
both columns, with the last writer winning.
    
    The best solution would be to enforce name space uniqueness when creating 
vectors. The new "result set loader" does this, but I suspect other readers 
might not -- depending on the particular way that they create their vectors. 
Still, creating names that differ only in case is a bug and any code doing that 
should be fixed. 


> Use RecordBatchSizer to estimate size of columns in HashAgg
> -----------------------------------------------------------
>
>                 Key: DRILL-6032
>                 URL: https://issues.apache.org/jira/browse/DRILL-6032
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Timothy Farkas
>            Assignee: Timothy Farkas
>            Priority: Major
>             Fix For: 1.13.0
>
>
> We need to use the RecordBatchSize to estimate the size of columns in the 
> Partition batches created by HashAgg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6032) Use RecordBatchSizer to estimate size of columns in HashAgg

Reply via email to