[jira] [Commented] (DRILL-5601) Rollup of External Sort memory management fixes

ASF GitHub Bot (JIRA) Wed, 19 Jul 2017 11:00:27 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093508#comment-16093508
 ]


ASF GitHub Bot commented on DRILL-5601:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/860#discussion_r128129255
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/PriorityQueueCopierWrapper.java
 ---
    @@ -245,29 +250,35 @@ private BatchMerger(PriorityQueueCopierWrapper 
holder, BatchSchema schema, List<
     
         @Override
         public boolean next() {
    -      Stopwatch w = Stopwatch.createStarted();
           long start = holder.getAllocator().getAllocatedMemory();
    +
    +      // Allocate an outgoing container the "dumb" way (based on static 
sizes)
    +      // for testing, or the "smart" way (based on actual observed data 
sizes)
    +      // for production code.
    +
    +      if (allocHelper == null) {
    +        VectorAccessibleUtilities.allocateVectors(outputContainer, 
targetRecordCount);
    +      } else {
    +        allocHelper.allocateBatch(outputContainer, targetRecordCount);
    +      }
    +      logger.trace("Initial output batch allocation: {} bytes",
    +                   holder.getAllocator().getAllocatedMemory() - start);
    +      Stopwatch w = Stopwatch.createStarted();
           int count = holder.copier.next(targetRecordCount);
    -      copyCount += count;
           if (count > 0) {
             long t = w.elapsed(TimeUnit.MICROSECONDS);
             batchCount++;
    -        logger.trace("Took {} us to merge {} records", t, count);
             long size = holder.getAllocator().getAllocatedMemory() - start;
    +        logger.trace("Took {} us to merge {} records, consuming {} bytes 
of memory",
    +                     t, count, size);
             estBatchSize = Math.max(estBatchSize, size);
           } else {
             logger.trace("copier returned 0 records");
           }
     
    -      // Identify the schema to be used in the output container. (Since
    -      // all merged batches have the same schema, the schema we identify
    -      // here should be the same as that which we already had.
    +      // Initialize output container metadata.
    --- End diff --
    
    They were actually a bit off the mark and reflected a partial 
understanding. The very nature of the buildSchema() is just to copy schema from 
vectors into the schema for the batch; it has nothing (directly) to do with the 
schema of incoming batches.


> Rollup of External Sort memory management fixes
> -----------------------------------------------
>
>                 Key: DRILL-5601
>                 URL: https://issues.apache.org/jira/browse/DRILL-5601
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.11.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> Rollup of a set of specific JIRA entries that all relate to the very 
> difficult problem of managing memory within Drill in order for the external 
> sort to stay within a memory budget. In general, the fixes relate to better 
> estimating memory used by the three ways that Drill allocates vector memory 
> (see DRILL-5522) and to predicting the size of vectors that the sort will 
> create, to avoid repeated realloc-copy cycles (see DRILL-5594).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5601) Rollup of External Sort memory management fixes

Reply via email to