[jira] [Commented] (DRILL-6126) Allocate memory for value vectors upfront in flatten operator

ASF GitHub Bot (JIRA) Sun, 04 Mar 2018 23:32:27 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385713#comment-16385713
 ]


ASF GitHub Bot commented on DRILL-6126:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1125#discussion_r172104598
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java 
---
    @@ -76,110 +82,327 @@
          * greater than (but unlikely) same as the row count.
          */
     
    -    public final int valueCount;
    +    private final int valueCount;
     
         /**
    -     * Total number of elements for a repeated type, or 1 if this is
    -     * a non-repeated type. That is, a batch of 100 rows may have an
    -     * array with 10 elements per row. In this case, the element count
    -     * is 1000.
    +     * Total number of elements for a repeated type, or same as
    +     * valueCount if this is a non-repeated type. That is, a batch
    +     * of 100 rows may have an array with 10 elements per row.
    +     * In this case, the element count is 1000.
          */
     
    -    public final int elementCount;
    +    private int elementCount;
     
         /**
    -     * Size of the top level value vector. For map and repeated list,
    -     * this is just size of offset vector.
    +     * The estimated, average number of elements per parent value.
    +     * Always 1 for a non-repeated type. For a repeated type,
    +     * this is the average entries per array (per repeated element).
          */
    -    public int dataSize;
    +
    +    private float estElementCountPerArray;
     
         /**
    -     * Total size of the column includes the sum total of memory for all
    -     * value vectors representing the column.
    +     * Indicates if it is variable width column.
    +     * For map columns, this is true if any of the children is variable
    +     * width column.
          */
    -    public int netSize;
    +
    +    private boolean isVariableWidth;
     
         /**
    -     * The estimated, average number of elements per parent value.
    -     * Always 1 for a non-repeated type. For a repeated type,
    -     * this is the average entries per array (per repeated element).
    +     * Indicates if cardinality is repeated(top level only).
    +     */
    +
    +    private boolean isRepeated;
    --- End diff --
    
    Might be fun to check out the new metadata classes added for the result set 
loader. They parse the `MajorType` to pull out this kind of information. You 
could embed an instance of the `ColumnMetadata` class here to provide this 
detailed information.


> Allocate memory for value vectors upfront in flatten operator
> -------------------------------------------------------------
>
>                 Key: DRILL-6126
>                 URL: https://issues.apache.org/jira/browse/DRILL-6126
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> With recent changes to control batch size for flatten operator, we figure out 
> row count in the output batch based on memory. Since we know how many rows we 
> are going to include in the batch, we can also allocate the memory needed 
> upfront instead of starting with initial value (4096) and doubling, copying 
> every time we need more. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6126) Allocate memory for value vectors upfront in flatten operator

Reply via email to