[
https://issues.apache.org/jira/browse/DRILL-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093508#comment-16093508
]
ASF GitHub Bot commented on DRILL-5601:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/860#discussion_r128129255
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/PriorityQueueCopierWrapper.java
---
@@ -245,29 +250,35 @@ private BatchMerger(PriorityQueueCopierWrapper
holder, BatchSchema schema, List<
@Override
public boolean next() {
- Stopwatch w = Stopwatch.createStarted();
long start = holder.getAllocator().getAllocatedMemory();
+
+ // Allocate an outgoing container the "dumb" way (based on static
sizes)
+ // for testing, or the "smart" way (based on actual observed data
sizes)
+ // for production code.
+
+ if (allocHelper == null) {
+ VectorAccessibleUtilities.allocateVectors(outputContainer,
targetRecordCount);
+ } else {
+ allocHelper.allocateBatch(outputContainer, targetRecordCount);
+ }
+ logger.trace("Initial output batch allocation: {} bytes",
+ holder.getAllocator().getAllocatedMemory() - start);
+ Stopwatch w = Stopwatch.createStarted();
int count = holder.copier.next(targetRecordCount);
- copyCount += count;
if (count > 0) {
long t = w.elapsed(TimeUnit.MICROSECONDS);
batchCount++;
- logger.trace("Took {} us to merge {} records", t, count);
long size = holder.getAllocator().getAllocatedMemory() - start;
+ logger.trace("Took {} us to merge {} records, consuming {} bytes
of memory",
+ t, count, size);
estBatchSize = Math.max(estBatchSize, size);
} else {
logger.trace("copier returned 0 records");
}
- // Identify the schema to be used in the output container. (Since
- // all merged batches have the same schema, the schema we identify
- // here should be the same as that which we already had.
+ // Initialize output container metadata.
--- End diff --
They were actually a bit off the mark and reflected a partial
understanding. The very nature of the buildSchema() is just to copy schema from
vectors into the schema for the batch; it has nothing (directly) to do with the
schema of incoming batches.
> Rollup of External Sort memory management fixes
> -----------------------------------------------
>
> Key: DRILL-5601
> URL: https://issues.apache.org/jira/browse/DRILL-5601
> Project: Apache Drill
> Issue Type: Task
> Affects Versions: 1.11.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> Rollup of a set of specific JIRA entries that all relate to the very
> difficult problem of managing memory within Drill in order for the external
> sort to stay within a memory budget. In general, the fixes relate to better
> estimating memory used by the three ways that Drill allocates vector memory
> (see DRILL-5522) and to predicting the size of vectors that the sort will
> create, to avoid repeated realloc-copy cycles (see DRILL-5594).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)