[ 
https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372441#comment-16372441
 ] 

ASF GitHub Bot commented on DRILL-6126:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169857960
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -418,11 +438,13 @@ private void measureColumn(ValueVector v, String 
prefix) {
         netRowWidthCap50 += ! colSize.isVariableWidth ? colSize.estSize :
             8 /* offset vector */ + 
roundUpToPowerOf2(Math.min(colSize.estSize,50));
             // above change 8 to 4 after DRILL-5446 is fixed
    +
    +    return colSize;
       }
     
    -  private void expandMap(AbstractMapVector mapVector, String prefix) {
    +  private void expandMap(ColumnSize colSize, AbstractMapVector mapVector, 
String prefix) {
         for (ValueVector vector : mapVector) {
    -      measureColumn(vector, prefix);
    +      colSize.childColumnSizes.put(prefix + vector.getField().getName(), 
measureColumn(vector, prefix));
    --- End diff --
    
    This is subject to aliasing. Suppose I have two maps:
    
    ```
    aa(b)
    a(ab)
    ```
    When I add the child vectors, both will produce a combined name of `aab`.
    
    We can't use dots n names for the same reason:
    
    ```
    a.b(c)
    a(b.c)
    ```
    
    Both will produce `a.b.c`.
    
    In the new "result set loader" code, all places that handle trees of 
columns use actual trees of maps.
    
    A crude-but-effecive solution is to use a non-legal name character. The 
only valid one is the back-tick since we use that in SQL to quote names. If we 
do that, we now have
    
    ```
    aa`b
    a`ab
    a.b`c
    a`b.c
    ```
    
    And the names are now un-aliased.


> Allocate memory for value vectors upfront in flatten operator
> -------------------------------------------------------------
>
>                 Key: DRILL-6126
>                 URL: https://issues.apache.org/jira/browse/DRILL-6126
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> With recent changes to control batch size for flatten operator, we figure out 
> row count in the output batch based on memory. Since we know how many rows we 
> are going to include in the batch, we can also allocate the memory needed 
> upfront instead of starting with initial value (4096) and doubling, copying 
> every time we need more. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to