[jira] [Commented] (DRILL-5669) Multiple TPCH queries failed due to OOM

ASF GitHub Bot (JIRA) Tue, 18 Jul 2017 16:04:32 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092300#comment-16092300
 ]


ASF GitHub Bot commented on DRILL-5669:
---------------------------------------

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/879
  
    Increasing the minimum means each sort uses more memory than before. This 
means that the query uses more memory than the 2 GB the user has allotted. 
Drill cannot create new memory, so that memory is coming from somewhere. So, 
you are right that increasing memory has impact, but we've already incurred 
that impact by increasing the minimum (that's why we did it.)
    
    So, I'm asking if we should increase the aggregate maximum to reflect that 
memory use is no longer constrained by the per-query maximum.


> Multiple TPCH queries failed due to OOM
> ---------------------------------------
>
>                 Key: DRILL-5669
>                 URL: https://issues.apache.org/jira/browse/DRILL-5669
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>         Environment: RHEL 6.4 2.6.32-358.el6.x86_64, 10+1 nodes cluster
>            Reporter: Dechang Gu
>            Assignee: Boaz Ben-Zvi
>              Labels: ready-to-commit
>             Fix For: 1.11.0
>
>         Attachments: 26999476-174e-98fd-e21e-fd53f79284c7.sys.drill
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Running TPCH SF100 Parquet (and CSV) tests, multiple queries failed due to 
> OOM. For example, Q16 hit the following error:
> {code}
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> Unable to allocate sv2 for 65536 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 23500416
> allocator limit 20000000
> Fragment 1:11
> [Error Id: e58161a6-2383-48b1-a350-50db1b5408c6 on ucs-node10.perf.lab:31010]
>         at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:489)
>         at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:593)
>         at 
> org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:215)
>         at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:140)
>         at PipSQueak.fetchRows(PipSQueak.java:420)
>         at PipSQueak.runTest(PipSQueak.java:116)
>         at PipSQueak.main(PipSQueak.java:556)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: RESOURCE 
> ERROR: One or more nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 65536 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 23500416
> allocator limit 20000000
> Fragment 1:11
> {code}
> And in drillbit.log:
> {code}
> 2017-07-12 11:34:11,670 ucs-node10.perf.lab 
> [26999476-174e-98fd-e21e-fd53f79284c7:frag:1:11] INFO  
> o.a.d.e.p.i.xsort.ExternalSortBatch - User Error Occurred: One or more nodes 
> ran out of memory while executing the query.
> org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
> nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 65536 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 23500416
> allocator limit 20000000
> [Error Id: e58161a6-2383-48b1-a350-50db1b5408c6 ]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
>  ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.newSV2(ExternalSortBatch.java:639)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:381)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:140)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) 
> [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:144)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) 
> [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) 
> [na:1.7.0_65]
>         at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_65]
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  [hadoop-common-2.7.0-mapr-1607.jar:na]
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227)
>  [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_65]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_65]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_65]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5669) Multiple TPCH queries failed due to OOM

Reply via email to