[ 
https://issues.apache.org/jira/browse/DRILL-5226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895430#comment-15895430
 ] 

ASF GitHub Bot commented on DRILL-5226:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/767#discussion_r104275259
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java
 ---
    @@ -1333,8 +1339,43 @@ private void spillFromMemory() {
         mergeAndSpill(bufferedBatches, spillCount);
       }
     
    +  private void mergeRuns(int targetCount) {
    +
    +    // Determine the number of runs to merge. The count should be the
    +    // target count. However, to prevent possible memory overrun, we
    +    // double-check with actual spill batch size and only spill as much
    +    // as fits in the merge memory pool.
    +
    +    int mergeCount = 0;
    +    long mergeSize = 0;
    +    for (SpilledRun batch : spilledRuns) {
    +      long batchSize = batch.getBatchSize();
    +      if (mergeSize + batchSize > mergeMemoryPool) {
    +        break;
    +      }
    +      mergeSize += batchSize;
    +      mergeCount++;
    +      if (mergeCount == targetCount) {
    +        break;
    +      }
    +    }
    +
    +    // Must always spill at least 2, even if this creates an over-size
    +    // spill file. But, if this is a final consolidation, we may have only
    +    // a single batch.
    +
    +    mergeCount = Math.max(mergeCount, 2);
    +    mergeCount = Math.min(mergeCount, spilledRuns.size());
    +
    +    // Do the actual spill.
    +
    +    mergeAndSpill(spilledRuns, mergeCount);
    --- End diff --
    
    The logic is a bit more complex. We use the size of the largest row as our 
row size estimate. But, when we spill, we also consider the actual batch sizes. 
Earlier versions of the code were more sensitive to the row size estimate, more 
recent revisions have evolved to favor actual sizes in many cases.
    
    For "first generation" batches, we use the actual "payload" amount of data 
in each batch, sum that amount, and stop when we have enough data to fill a 
spill file. Thus, first generation spills are immune to the problem you 
describe.
    
    For merge/spills, we guess a target number of batches based on the target 
spill batch size. If the above logic works, then we know (roughly) the size of 
the spilled batches -- because we created them. As a sanity check, new code 
above sums the actual batch sizes in the files to merge to ensure that they fit 
in memory, choosing a smaller-than-requested number of batches if the merge 
batches turn out to be a bit larger than expected.


> External Sort encountered an error while spilling to disk
> ---------------------------------------------------------
>
>                 Key: DRILL-5226
>                 URL: https://issues.apache.org/jira/browse/DRILL-5226
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>         Attachments: 277578d5-8bea-27db-0da1-cec0f53a13df.sys.drill, 
> profile_scenario3.sys.drill, scenario3.log
>
>
> Environment : 
> {code}
> git.commit.id.abbrev=2af709f
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> Nodes in Mapr Cluster : 1
> Data Size : ~ 0.35 GB
> No of Columns : 1
> Width of column : 256 chars
> {code}
> The below query fails before spilling to disk due to wrong estimates of the 
> record batch size.
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_node` = 1;
> +-------+--------------------------------------+
> |  ok   |               summary                |
> +-------+--------------------------------------+
> | true  | planner.width.max_per_node updated.  |
> +-------+--------------------------------------+
> 1 row selected (1.11 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 62914560;
> +-------+----------------------------------------------------+
> |  ok   |                      summary                       |
> +-------+----------------------------------------------------+
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +-------+----------------------------------------------------+
> 1 row selected (0.362 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.disable_exchanges` = true;
> +-------+-------------------------------------+
> |  ok   |               summary               |
> +-------+-------------------------------------+
> | true  | planner.disable_exchanges updated.  |
> +-------+-------------------------------------+
> 1 row selected (0.277 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select * from (select * from 
> dfs.`/drill/testdata/resource-manager/250wide-small.tbl` order by 
> columns[0])d where d.columns[0] = 'ljdfhwuehnoiueyf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to 
> disk
> Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory 
> limit. Current allocation: 62736000
> Fragment 0:0
> [Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> Exception from the logs
> {code}
> 2017-01-26 15:33:09,307 [277578d5-8bea-27db-0da1-cec0f53a13df:frag:0:0] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: External Sort 
> encountered an error while spilling to disk (Unable to allocate buffer of 
> size 1048576 (rounded from 618889) due to memory limit. Current allocation: 
> 62736000)
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: External 
> Sort encountered an error while spilling to disk
> Unable to allocate buffer of size 1048576 (rounded from 618889) due to memory 
> limit. Current allocation: 62736000
> [Error Id: 1bb933c8-7dc6-4cbd-8c8e-0e095baac719 ]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
>  ~[drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:603)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:411)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:93)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) 
> [na:1.7.0_111]
>         at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_111]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  [hadoop-common-2.7.0-mapr-1607.jar:na]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:226)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_111]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_111]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_111]
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 1048576 (rounded from 618889) due to memory limit. 
> Current allocation: 62736000
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216) 
> ~[drill-memory-base-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:191) 
> ~[drill-memory-base-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStream(VectorAccessibleSerializable.java:112)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.xsort.BatchGroup.getBatch(BatchGroup.java:111)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.xsort.BatchGroup.getNextIndex(BatchGroup.java:137)
>  ~[drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.test.generated.PriorityQueueCopierGen7.next(PriorityQueueCopierTemplate.java:76)
>  ~[na:na]
>         at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:590)
>  [drill-java-exec-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
>         ... 45 common frames omitted
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to