[ https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385714#comment-16385714 ]
ASF GitHub Bot commented on DRILL-6126: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r172104395 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -76,110 +82,327 @@ * greater than (but unlikely) same as the row count. */ - public final int valueCount; + private final int valueCount; /** - * Total number of elements for a repeated type, or 1 if this is - * a non-repeated type. That is, a batch of 100 rows may have an - * array with 10 elements per row. In this case, the element count - * is 1000. + * Total number of elements for a repeated type, or same as + * valueCount if this is a non-repeated type. That is, a batch + * of 100 rows may have an array with 10 elements per row. + * In this case, the element count is 1000. */ - public final int elementCount; + private int elementCount; /** - * Size of the top level value vector. For map and repeated list, - * this is just size of offset vector. + * The estimated, average number of elements per parent value. + * Always 1 for a non-repeated type. For a repeated type, + * this is the average entries per array (per repeated element). */ - public int dataSize; + + private float estElementCountPerArray; --- End diff -- Perhaps `estCardinality`? I've been using the term "cardinality" in new code to refer to array sizes. > Allocate memory for value vectors upfront in flatten operator > ------------------------------------------------------------- > > Key: DRILL-6126 > URL: https://issues.apache.org/jira/browse/DRILL-6126 > Project: Apache Drill > Issue Type: Improvement > Reporter: Padma Penumarthy > Assignee: Padma Penumarthy > Priority: Critical > Fix For: 1.12.0 > > > With recent changes to control batch size for flatten operator, we figure out > row count in the output batch based on memory. Since we know how many rows we > are going to include in the batch, we can also allocate the memory needed > upfront instead of starting with initial value (4096) and doubling, copying > every time we need more. -- This message was sent by Atlassian JIRA (v7.6.3#76005)