[
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886818#comment-15886818
]
ASF GitHub Bot commented on DRILL-5284:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/761#discussion_r103334294
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
---
@@ -948,50 +1027,50 @@ private void updateMemoryEstimates(long memoryDelta,
RecordBatchSizer sizer) {
// spill batches of either 64K records, or as many records as fit into
the
// amount of memory dedicated to each spill batch, whichever is less.
- spillBatchRowCount = (int) Math.max(1, spillBatchSize /
estimatedRowWidth);
+ spillBatchRowCount = (int) Math.max(1, preferredSpillBatchSize /
estimatedRowWidth / 2);
spillBatchRowCount = Math.min(spillBatchRowCount, Character.MAX_VALUE);
+ // Compute the actual spill batch size which may be larger or smaller
+ // than the preferred size depending on the row width. Double the
estimated
+ // memory needs to allow for power-of-two rounding.
+
+ targetSpillBatchSize = spillBatchRowCount * estimatedRowWidth * 2;
+
// Determine the number of records per batch per merge step. The goal
is to
// merge batches of either 64K records, or as many records as fit into
the
// amount of memory dedicated to each merge batch, whichever is less.
- targetMergeBatchSize = preferredMergeBatchSize;
- mergeBatchRowCount = (int) Math.max(1, targetMergeBatchSize /
estimatedRowWidth);
+ mergeBatchRowCount = (int) Math.max(1, preferredMergeBatchSize /
estimatedRowWidth / 2);
mergeBatchRowCount = Math.min(mergeBatchRowCount, Character.MAX_VALUE);
+ targetMergeBatchSize = mergeBatchRowCount * estimatedRowWidth * 2;
// Determine the minimum memory needed for spilling. Spilling is done
just
// before accepting a batch, so we must spill if we don't have room
for a
// (worst case) input batch. To spill, we need room for the output
batch created
// by merging the batches already in memory. Double this to allow for
power-of-two
// memory allocations.
- spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
+ long spillPoint = estimatedInputBatchSize + 2 * targetSpillBatchSize;
// The merge memory pool assumes we can spill all input batches. To
make
// progress, we must have at least two merge batches (same size as an
output
// batch) and one output batch. Again, double to allow for power-of-two
// allocation and add one for a margin of error.
- int minMergeBatches = 2 * 3 + 1;
- long minMergeMemory = minMergeBatches * targetMergeBatchSize;
+ long minMergeMemory = Math.round((2 * targetSpillBatchSize +
targetMergeBatchSize) * 1.05);
// If we are in a low-memory condition, then we might not have room
for the
// default output batch size. In that case, pick a smaller size.
- long minMemory = Math.max(spillPoint, minMergeMemory);
- if (minMemory > memoryLimit) {
-
- // Figure out the minimum output batch size based on memory, but
can't be
- // any smaller than the defined minimum.
-
- targetMergeBatchSize = Math.max(MIN_MERGED_BATCH_SIZE, memoryLimit /
minMergeBatches);
+ if (minMergeMemory > memoryLimit) {
- // Regardless of anything else, the batch must hold at least one
- // complete row.
+ // Figure out the minimum output batch size based on memory,
+ // must hold at least one complete row.
- targetMergeBatchSize = Math.max(estimatedRowWidth,
targetMergeBatchSize);
- spillPoint = estimatedInputBatchSize + 2 * spillBatchSize;
- minMergeMemory = minMergeBatches * targetMergeBatchSize;
+ long mergeAllowance = Math.round((memoryLimit - 2 *
targetSpillBatchSize) * 0.95);
+ targetMergeBatchSize = Math.max(estimatedRowWidth, mergeAllowance /
2);
+ mergeBatchRowCount = (int) (targetMergeBatchSize / estimatedRowWidth
/ 2);
--- End diff --
Good catch! Fixed.
> Roll-up of final fixes for managed sort
> ---------------------------------------
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time,
> extensive testing has identified a number of minor fixes and improvements.
> Given the long PR cycles, it is not practical to spend a week or two to do a
> PR for each fix individually. This ticket represents a roll-up of a
> combination of a number of fixes. Small fixes are listed here, larger items
> appear as sub-tasks.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)