[
https://issues.apache.org/jira/browse/DRILL-5284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886815#comment-15886815
]
ASF GitHub Bot commented on DRILL-5284:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/761#discussion_r103335045
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
---
@@ -1231,52 +1308,44 @@ private boolean consolidateBatches() {
* This method spills only half the accumulated batches
* minimizing unnecessary disk writes. The exact count must lie between
* the minimum and maximum spill counts.
- */
+ */
private void spillFromMemory() {
// Determine the number of batches to spill to create a spill file
// of the desired size. The actual file size might be a bit larger
// or smaller than the target, which is expected.
- long estSize = 0;
int spillCount = 0;
+ long spillSize = 0;
for (InputBatch batch : bufferedBatches) {
- estSize += batch.getDataSize();
- if (estSize > spillFileSize) {
- break; }
+ long batchSize = batch.getDataSize();
+ spillSize += batchSize;
spillCount++;
+ if (spillSize + batchSize / 2 > spillFileSize) {
+ break; }
}
- // Should not happen, but just to be sure...
+ // Must always spill at least 2, even if this creates an over-size
+ // spill file.
- if (spillCount == 0) {
- return; }
+ spillCount = Math.max(spillCount, 2);
// Do the actual spill.
- logger.trace("Starting spill from memory. Memory = {}, Buffered batch
count = {}, Spill batch count = {}",
- allocator.getAllocatedMemory(), bufferedBatches.size(),
spillCount);
mergeAndSpill(bufferedBatches, spillCount);
}
private void mergeAndSpill(LinkedList<? extends BatchGroup> source, int
count) {
- if (count == 0) {
- return; }
spilledRuns.add(doMergeAndSpill(source, count));
}
private BatchGroup.SpilledRun doMergeAndSpill(LinkedList<? extends
BatchGroup> batchGroups, int spillCount) {
List<BatchGroup> batchesToSpill = Lists.newArrayList();
spillCount = Math.min(batchGroups.size(), spillCount);
assert spillCount > 0 : "Spill count to mergeAndSpill must not be
zero";
- long spillSize = 0;
for (int i = 0; i < spillCount; i++) {
- @SuppressWarnings("resource")
- BatchGroup batch = batchGroups.pollFirst();
- assert batch != null : "Encountered a null batch during merge and
spill operation";
- batchesToSpill.add(batch);
- spillSize += batch.getDataSize();
+ batchesToSpill.add(batchGroups.pollFirst());
--- End diff --
This won't happen for first-generation spills due to the check in
`isSpillNeeded`. But, it could happen when spilling for merges, so I fixed the
code.
> Roll-up of final fixes for managed sort
> ---------------------------------------
>
> Key: DRILL-5284
> URL: https://issues.apache.org/jira/browse/DRILL-5284
> Project: Apache Drill
> Issue Type: Bug
> Affects Versions: 1.10.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> The managed external sort was introduced in DRILL-5080. Since that time,
> extensive testing has identified a number of minor fixes and improvements.
> Given the long PR cycles, it is not practical to spend a week or two to do a
> PR for each fix individually. This ticket represents a roll-up of a
> combination of a number of fixes. Small fixes are listed here, larger items
> appear as sub-tasks.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)