[ 
https://issues.apache.org/jira/browse/DRILL-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039797#comment-16039797
 ] 

ASF GitHub Bot commented on DRILL-5325:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/808#discussion_r120497872
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/SortMemoryManager.java
 ---
    @@ -0,0 +1,506 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl.xsort.managed;
    +
    +public class SortMemoryManager {
    +
    +  /**
    +   * Maximum memory this operator may use. Usually comes from the
    +   * operator definition, but may be overridden by a configuration
    +   * parameter for unit testing.
    +   */
    +
    +  private final long memoryLimit;
    +
    +  /**
    +   * Estimated size of the records for this query, updated on each
    +   * new batch received from upstream.
    +   */
    +
    +  private int estimatedRowWidth;
    +
    +  /**
    +   * Size of the merge batches that this operator produces. Generally
    +   * the same as the merge batch size, unless low memory forces a smaller
    +   * value.
    +   */
    +
    +  private int expectedMergeBatchSize;
    +
    +  /**
    +   * Estimate of the input batch size based on the largest batch seen
    +   * thus far.
    +   */
    +  private int estimatedInputBatchSize;
    +
    +  /**
    +   * Maximum memory level before spilling occurs. That is, we can buffer 
input
    +   * batches in memory until we reach the level given by the buffer memory 
pool.
    +   */
    +
    +  private long bufferMemoryLimit;
    +
    +  /**
    +   * Maximum memory that can hold batches during the merge
    +   * phase.
    +   */
    +
    +  private long mergeMemoryLimit;
    +
    +  /**
    +   * The target size for merge batches sent downstream.
    +   */
    +
    +  private int preferredMergeBatchSize;
    +
    +  /**
    +   * The configured size for each spill batch.
    +   */
    +  private int preferredSpillBatchSize;
    +
    +  /**
    +   * Estimated number of rows that fit into a single spill batch.
    +   */
    +
    +  private int spillBatchRowCount;
    +
    +  /**
    +   * The estimated actual spill batch size which depends on the
    +   * details of the data rows for any particular query.
    +   */
    +
    +  private int expectedSpillBatchSize;
    +
    +  /**
    +   * The number of records to add to each output batch sent to the
    +   * downstream operator or spilled to disk.
    +   */
    +
    +  private int mergeBatchRowCount;
    +
    +  private SortConfig config;
    +
    +//  private long spillPoint;
    +
    +//  private long minMergeMemory;
    +
    +  private int estimatedInputSize;
    +
    +  private boolean potentialOverflow;
    +
    +  public SortMemoryManager(SortConfig config, long memoryLimit) {
    +    this.config = config;
    +
    +    // The maximum memory this operator can use as set by the
    +    // operator definition (propagated to the allocator.)
    +
    +    if (config.maxMemory() > 0) {
    +      this.memoryLimit = Math.min(memoryLimit, config.maxMemory());
    +    } else {
    +      this.memoryLimit = memoryLimit;
    +    }
    +
    +    preferredSpillBatchSize = config.spillBatchSize();;
    +    preferredMergeBatchSize = config.mergeBatchSize();
    +  }
    +
    +  /**
    +   * Update the data-driven memory use numbers including:
    +   * <ul>
    +   * <li>The average size of incoming records.</li>
    +   * <li>The estimated spill and output batch size.</li>
    +   * <li>The estimated number of average-size records per
    +   * spill and output batch.</li>
    +   * <li>The amount of memory set aside to hold the incoming
    +   * batches before spilling starts.</li>
    +   * </ul>
    +   * <p>
    +   * Under normal circumstances, the amount of memory available is much
    +   * larger than the input, spill or merge batch sizes. The primary 
question
    +   * is to determine how many input batches we can buffer during the load
    +   * phase, and how many spill batches we can merge during the merge
    +   * phase.
    +   *
    +   * @param batchSize the overall size of the current batch received from
    +   * upstream
    +   * @param batchRowWidth the width in bytes (including overhead) of each
    --- End diff --
    
    Fixed.


> Implement sub-operator unit tests for managed external sort
> -----------------------------------------------------------
>
>                 Key: DRILL-5325
>                 URL: https://issues.apache.org/jira/browse/DRILL-5325
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Tools, Build & Test
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.11.0
>
>
> Validate the proposed sub-operator test framework, by creating low-level unit 
> tests for the managed version of the external sort.
> The external sort has a small number of existing tests, but those tests are 
> quite superficial; the "managed sort" project found many bugs. The managed 
> sort itself was tested with ad-hoc system-level tests created using the new 
> "cluster fixture" framework. But, again, such tests could not reach deep 
> inside the sort code to exercise very specific conditions.
> As a result, we spent far too much time using QA functional tests to identify 
> specific code issues.
> Using the sub-opeator unit test framework, we can instead test each bit of 
> functionality at the unit test level.
> If doing so works, and is practical, it can serve as a model for other 
> operator testing projects.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to