[ https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372441#comment-16372441 ]
ASF GitHub Bot commented on DRILL-6126: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1125#discussion_r169857960 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java --- @@ -418,11 +438,13 @@ private void measureColumn(ValueVector v, String prefix) { netRowWidthCap50 += ! colSize.isVariableWidth ? colSize.estSize : 8 /* offset vector */ + roundUpToPowerOf2(Math.min(colSize.estSize,50)); // above change 8 to 4 after DRILL-5446 is fixed + + return colSize; } - private void expandMap(AbstractMapVector mapVector, String prefix) { + private void expandMap(ColumnSize colSize, AbstractMapVector mapVector, String prefix) { for (ValueVector vector : mapVector) { - measureColumn(vector, prefix); + colSize.childColumnSizes.put(prefix + vector.getField().getName(), measureColumn(vector, prefix)); --- End diff -- This is subject to aliasing. Suppose I have two maps: ``` aa(b) a(ab) ``` When I add the child vectors, both will produce a combined name of `aab`. We can't use dots n names for the same reason: ``` a.b(c) a(b.c) ``` Both will produce `a.b.c`. In the new "result set loader" code, all places that handle trees of columns use actual trees of maps. A crude-but-effecive solution is to use a non-legal name character. The only valid one is the back-tick since we use that in SQL to quote names. If we do that, we now have ``` aa`b a`ab a.b`c a`b.c ``` And the names are now un-aliased. > Allocate memory for value vectors upfront in flatten operator > ------------------------------------------------------------- > > Key: DRILL-6126 > URL: https://issues.apache.org/jira/browse/DRILL-6126 > Project: Apache Drill > Issue Type: Improvement > Reporter: Padma Penumarthy > Assignee: Padma Penumarthy > Priority: Critical > Fix For: 1.12.0 > > > With recent changes to control batch size for flatten operator, we figure out > row count in the output batch based on memory. Since we know how many rows we > are going to include in the batch, we can also allocate the memory needed > upfront instead of starting with initial value (4096) and doubling, copying > every time we need more. -- This message was sent by Atlassian JIRA (v7.6.3#76005)