[
https://issues.apache.org/jira/browse/DRILL-6126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385720#comment-16385720
]
ASF GitHub Bot commented on DRILL-6126:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/1125#discussion_r172105917
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchSizer.java
---
@@ -199,12 +422,18 @@ public String toString() {
.append(", per-array: ")
.append(estElementCountPerArray);
}
- buf .append(", std size: ")
- .append(stdSize)
- .append(", actual size: ")
- .append(estSize)
- .append(", data size: ")
- .append(dataSize)
+ buf .append(", std size per entry: ")
+ .append(getStdDataSizePerEntry())
+ .append(", std net size per entry: ")
+ .append(getStdNetSizePerEntry())
+ .append(", data size per entry: ")
+ .append(getDataSizePerEntry())
+ .append(", net size per entry: ")
+ .append(getNetSizePerEntry())
+ .append(", totalDataSize: ")
+ .append(getTotalDataSize())
+ .append(", totalNetSize: ")
+ .append(getTotalNetSize())
--- End diff --
Extra info is always good. This info is printed by the sort operator when
debug logging is enabled. (It was invaluable when investigating issues found by
QA tests.) Suggestion: find a way to compact the labels. Maybe:
```
Per entry: std size: xx, std net: xxx; Totals: data size: xxx, net size: xxx
```
Also, `totalDataSize` and `totalNetSize` should have spaces: they are
labels, not variable names here.
> Allocate memory for value vectors upfront in flatten operator
> -------------------------------------------------------------
>
> Key: DRILL-6126
> URL: https://issues.apache.org/jira/browse/DRILL-6126
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Padma Penumarthy
> Assignee: Padma Penumarthy
> Priority: Critical
> Fix For: 1.12.0
>
>
> With recent changes to control batch size for flatten operator, we figure out
> row count in the output batch based on memory. Since we know how many rows we
> are going to include in the batch, we can also allocate the memory needed
> upfront instead of starting with initial value (4096) and doubling, copying
> every time we need more.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)