[ 
https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372443#comment-16372443
 ] 

ASF GitHub Bot commented on DRILL-6126:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r169859674
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -245,16 +251,30 @@ private void buildVectorInitializer(VectorInitializer 
initializer) {
           else if (width > 0) {
             initializer.variableWidth(name, width);
           }
    +
    +      for (ColumnSize columnSize : childColumnSizes.values()) {
    +        columnSize.buildVectorInitializer(initializer);
    +      }
         }
    +
       }
     
       public static ColumnSize getColumn(ValueVector v, String prefix) {
         return new ColumnSize(v, prefix);
       }
     
    +  public ColumnSize getColumn(String name) {
    +    return allColumnSizes.get(name);
    +  }
    +
       public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; 
// 16 MiB
     
    -  private Map<String, ColumnSize> columnSizes = 
CaseInsensitiveMap.newHashMap();
    +  // This keeps information for all columns i.e. all top columns and 
nested columns underneath
    +  private Map<String, ColumnSize> allColumnSizes = 
CaseInsensitiveMap.newHashMap();
    --- End diff --
    
    I'm a bit confused. I saw above that a `ColumnSize` has a list of children. 
Why are the children repeated here, introducing the naming issues described 
below?
    
    Given the column alias issues, and how column index is used elsewhere, can 
this just be an index of top-level columns? Then, to size map vectors (the only 
one that has the nesting issue), use the code for recursion that already exists 
in the `VectorInitializer`?


> Allocate memory for value vectors upfront in flatten operator
> -------------------------------------------------------------
>
>                 Key: DRILL-6126
>                 URL: https://issues.apache.org/jira/browse/DRILL-6126
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> With recent changes to control batch size for flatten operator, we figure out 
> row count in the output batch based on memory. Since we know how many rows we 
> are going to include in the batch, we can also allocate the memory needed 
> upfront instead of starting with initial value (4096) and doubling, copying 
> every time we need more. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to