[ 
https://issues.apache.org/jira/browse/DRILL-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184913#comment-16184913
 ] 

Boaz Ben-Zvi commented on DRILL-5824:
-------------------------------------

The problem seems to be in the 1st phase Hash Aggr – this operator does not 
check for the "fallback" option, only falls-back automatically when 
num_partitions = 1 into setting the limit at 10 GB. This happens either when 
the user sets num_partition to 1, or when there is too little memory. In the 
latter case, the 1st would allocate unlimited memory (and never 'early 
returning"), then when the 2nd starts it would likely fail due to too little 
memory (no fallback).


> 1st phase Hash Aggregate allocates more memory than the limit
> -------------------------------------------------------------
>
>                 Key: DRILL-5824
>                 URL: https://issues.apache.org/jira/browse/DRILL-5824
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.11.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>             Fix For: 1.12.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The per query memory limit was set to 2G. But the 1st phase hash agg operator 
> memory limit set was larger than that:
> AGGR OOM at First Phase. Partitions: 32. Estimated batch size: 4784128. 
> values size: 3670016. Output alloc size: 3670016. Planned batches: 1 Memory 
> limit: 2680684544 so far allocated: 374341632.
> Fragment 3:0
> [Error Id: b22fe6ad-b805-433c-bae7-c0f60c30bb99 on 10.10.30.168:31010]
> (org.apache.drill.exec.exception.OutOfMemoryException) AGGR OOM at First 
> Phase. Partitions: 32. Estimated batch size: 4784128. values size: 3670016. 
> Output alloc size: 3670016. Planned batches: 1 Memory limit: 2680684544 so 
> far allocated: 374341632.
> org.apache.drill.exec.test.generated.HashAggregatorGen5265.checkGroupAndAggrValues():1350
> org.apache.drill.exec.test.generated.HashAggregatorGen5265.doWork():591
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():169
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
> org.apache.drill.exec.record.AbstractRecordBatch.next():164
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to