[
https://issues.apache.org/jira/browse/DRILL-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039797#comment-16039797
]
ASF GitHub Bot commented on DRILL-5325:
---------------------------------------
Github user paul-rogers commented on a diff in the pull request:
https://github.com/apache/drill/pull/808#discussion_r120497872
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortMemoryManager.java
---
@@ -0,0 +1,506 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.xsort.managed;
+
+public class SortMemoryManager {
+
+ /**
+ * Maximum memory this operator may use. Usually comes from the
+ * operator definition, but may be overridden by a configuration
+ * parameter for unit testing.
+ */
+
+ private final long memoryLimit;
+
+ /**
+ * Estimated size of the records for this query, updated on each
+ * new batch received from upstream.
+ */
+
+ private int estimatedRowWidth;
+
+ /**
+ * Size of the merge batches that this operator produces. Generally
+ * the same as the merge batch size, unless low memory forces a smaller
+ * value.
+ */
+
+ private int expectedMergeBatchSize;
+
+ /**
+ * Estimate of the input batch size based on the largest batch seen
+ * thus far.
+ */
+ private int estimatedInputBatchSize;
+
+ /**
+ * Maximum memory level before spilling occurs. That is, we can buffer
input
+ * batches in memory until we reach the level given by the buffer memory
pool.
+ */
+
+ private long bufferMemoryLimit;
+
+ /**
+ * Maximum memory that can hold batches during the merge
+ * phase.
+ */
+
+ private long mergeMemoryLimit;
+
+ /**
+ * The target size for merge batches sent downstream.
+ */
+
+ private int preferredMergeBatchSize;
+
+ /**
+ * The configured size for each spill batch.
+ */
+ private int preferredSpillBatchSize;
+
+ /**
+ * Estimated number of rows that fit into a single spill batch.
+ */
+
+ private int spillBatchRowCount;
+
+ /**
+ * The estimated actual spill batch size which depends on the
+ * details of the data rows for any particular query.
+ */
+
+ private int expectedSpillBatchSize;
+
+ /**
+ * The number of records to add to each output batch sent to the
+ * downstream operator or spilled to disk.
+ */
+
+ private int mergeBatchRowCount;
+
+ private SortConfig config;
+
+// private long spillPoint;
+
+// private long minMergeMemory;
+
+ private int estimatedInputSize;
+
+ private boolean potentialOverflow;
+
+ public SortMemoryManager(SortConfig config, long memoryLimit) {
+ this.config = config;
+
+ // The maximum memory this operator can use as set by the
+ // operator definition (propagated to the allocator.)
+
+ if (config.maxMemory() > 0) {
+ this.memoryLimit = Math.min(memoryLimit, config.maxMemory());
+ } else {
+ this.memoryLimit = memoryLimit;
+ }
+
+ preferredSpillBatchSize = config.spillBatchSize();;
+ preferredMergeBatchSize = config.mergeBatchSize();
+ }
+
+ /**
+ * Update the data-driven memory use numbers including:
+ * <ul>
+ * <li>The average size of incoming records.</li>
+ * <li>The estimated spill and output batch size.</li>
+ * <li>The estimated number of average-size records per
+ * spill and output batch.</li>
+ * <li>The amount of memory set aside to hold the incoming
+ * batches before spilling starts.</li>
+ * </ul>
+ * <p>
+ * Under normal circumstances, the amount of memory available is much
+ * larger than the input, spill or merge batch sizes. The primary
question
+ * is to determine how many input batches we can buffer during the load
+ * phase, and how many spill batches we can merge during the merge
+ * phase.
+ *
+ * @param batchSize the overall size of the current batch received from
+ * upstream
+ * @param batchRowWidth the width in bytes (including overhead) of each
--- End diff --
Fixed.
> Implement sub-operator unit tests for managed external sort
> -----------------------------------------------------------
>
> Key: DRILL-5325
> URL: https://issues.apache.org/jira/browse/DRILL-5325
> Project: Apache Drill
> Issue Type: Improvement
> Components: Tools, Build & Test
> Affects Versions: 1.11.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> Validate the proposed sub-operator test framework, by creating low-level unit
> tests for the managed version of the external sort.
> The external sort has a small number of existing tests, but those tests are
> quite superficial; the "managed sort" project found many bugs. The managed
> sort itself was tested with ad-hoc system-level tests created using the new
> "cluster fixture" framework. But, again, such tests could not reach deep
> inside the sort code to exercise very specific conditions.
> As a result, we spent far too much time using QA functional tests to identify
> specific code issues.
> Using the sub-opeator unit test framework, we can instead test each bit of
> functionality at the unit test level.
> If doing so works, and is practical, it can serve as a model for other
> operator testing projects.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)