[
https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372443#comment-16372443
]
ASF GitHub Bot commented on DRILL-6126:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1125#discussion_r169859674
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java
---
@@ -245,16 +251,30 @@ private void buildVectorInitializer(VectorInitializer
initializer) {
else if (width > 0) {
initializer.variableWidth(name, width);
}
+
+ for (ColumnSize columnSize : childColumnSizes.values()) {
+ columnSize.buildVectorInitializer(initializer);
+ }
}
+
}
public static ColumnSize getColumn(ValueVector v, String prefix) {
return new ColumnSize(v, prefix);
}
+ public ColumnSize getColumn(String name) {
+ return allColumnSizes.get(name);
+ }
+
public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE;
// 16 MiB
- private Map<String, ColumnSize> columnSizes =
CaseInsensitiveMap.newHashMap();
+ // This keeps information for all columns i.e. all top columns and
nested columns underneath
+ private Map<String, ColumnSize> allColumnSizes =
CaseInsensitiveMap.newHashMap();
--- End diff --
I'm a bit confused. I saw above that a `ColumnSize` has a list of children.
Why are the children repeated here, introducing the naming issues described
below?
Given the column alias issues, and how column index is used elsewhere, can
this just be an index of top-level columns? Then, to size map vectors (the only
one that has the nesting issue), use the code for recursion that already exists
in the `VectorInitializer`?
> Allocate memory for value vectors upfront in flatten operator
> -------------------------------------------------------------
>
> Key: DRILL-6126
> URL: https://issues.apache.org/jira/browse/DRILL-6126
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Critical
> Fix For: 1.12.0
>
>
> With recent changes to control batch size for flatten operator, we figure out
> row count in the output batch based on memory. Since we know how many rows we
> are going to include in the batch, we can also allocate the memory needed
> upfront instead of starting with initial value (4096) and doubling, copying
> every time we need more.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)