[
https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385713#comment-16385713
]
ASF GitHub Bot commented on DRILL-6126:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1125#discussion_r172104598
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java
---
@@ -76,110 +82,327 @@
* greater than (but unlikely) same as the row count.
*/
- public final int valueCount;
+ private final int valueCount;
/**
- * Total number of elements for a repeated type, or 1 if this is
- * a non-repeated type. That is, a batch of 100 rows may have an
- * array with 10 elements per row. In this case, the element count
- * is 1000.
+ * Total number of elements for a repeated type, or same as
+ * valueCount if this is a non-repeated type. That is, a batch
+ * of 100 rows may have an array with 10 elements per row.
+ * In this case, the element count is 1000.
*/
- public final int elementCount;
+ private int elementCount;
/**
- * Size of the top level value vector. For map and repeated list,
- * this is just size of offset vector.
+ * The estimated, average number of elements per parent value.
+ * Always 1 for a non-repeated type. For a repeated type,
+ * this is the average entries per array (per repeated element).
*/
- public int dataSize;
+
+ private float estElementCountPerArray;
/**
- * Total size of the column includes the sum total of memory for all
- * value vectors representing the column.
+ * Indicates if it is variable width column.
+ * For map columns, this is true if any of the children is variable
+ * width column.
*/
- public int netSize;
+
+ private boolean isVariableWidth;
/**
- * The estimated, average number of elements per parent value.
- * Always 1 for a non-repeated type. For a repeated type,
- * this is the average entries per array (per repeated element).
+ * Indicates if cardinality is repeated(top level only).
+ */
+
+ private boolean isRepeated;
--- End diff --
Might be fun to check out the new metadata classes added for the result set
loader. They parse the `MajorType` to pull out this kind of information. You
could embed an instance of the `ColumnMetadata` class here to provide this
detailed information.
> Allocate memory for value vectors upfront in flatten operator
> -------------------------------------------------------------
>
> Key: DRILL-6126
> URL: https://issues.apache.org/jira/browse/DRILL-6126
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Critical
> Fix For: 1.12.0
>
>
> With recent changes to control batch size for flatten operator, we figure out
> row count in the output batch based on memory. Since we know how many rows we
> are going to include in the batch, we can also allocate the memory needed
> upfront instead of starting with initial value (4096) and doubling, copying
> every time we need more.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)