The error you are experiencing is running out of memory in the Hive
Application master. These can be set with the following settings (tune as
appropriate).

set tez.am.resource.memory.mb=3072;set tez.am.launch.cmd-opts=-Xmx2560m;

Also, there has been some improvements in getTask running out of memory.
Could you share which version of Tez you are running with?

Regards,
jeagles

On Tue, Dec 8, 2015 at 2:22 PM, Rohit Garg <[email protected]> wrote:

> I have a script which runs fine on hive 13(YARN)
> I am experimenting with tez. When I run a query on large dataset , I run
> into the following error.
>
>       0 FATAL [Socket Reader #1 for port 55739]
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Socket
> Reader #1 for port 55739,5,main] threw an Error.  Shutting down now...
>      java.lang.OutOfMemoryError: GC overhead limit exceeded
>      at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
>      at
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1510)
>      at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:750)
>      at
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:624)
>      at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:595)
>      2015-12-07 20:31:32,859 FATAL [AsyncDispatcher event handler]
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
>      java.lang.OutOfMemoryError: GC overhead limit exceeded
>      2015-12-07 20:31:30,590 WARN [IPC Server handler 0 on 55739]
> org.apache.hadoop.ipc.Server: IPC Server handler 0 on 55739, call
> heartbeat({  containerId=container_1449516549171_0001_01_000100,
> requestId=10184, startIndex=0, maxEventsToGet=0, taskAttemptId=null,
> eventCount=0 }), rpc version=2, client version=19,
> methodsFingerPrint=557389974 from 10.10.30.35:47028 Call#11165 Retry#0:
> error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>      java.lang.OutOfMemoryError: GC overhead limit exceeded
>      at
>
> javax.security.auth.SubjectDomainCombiner.optimize(SubjectDomainCombiner.java:464)
>      at
>
> javax.security.auth.SubjectDomainCombiner.combine(SubjectDomainCombiner.java:267)
>      at
>
> java.security.AccessControlContext.goCombiner(AccessControlContext.java:499)
>      at
> java.security.AccessControlContext.optimize(AccessControlContext.java:407)
>      at
> java.security.AccessController.getContext(AccessController.java:501)
>      at javax.security.auth.Subject.doAs(Subject.java:412)
>      at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>      at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>      2015-12-07 20:32:53,495 INFO [Thread-60]
> amazon.emr.metrics.MetricsSaver: Saved 4:3 records to
> /mnt/var/em/raw/i-782f08c8_20151207_7921_07921_raw.bin
>      2015-12-07 20:32:53,495 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
>      2015-12-07 20:32:50,435 INFO [IPC Server handler 20 on 55739]
> org.apache.hadoop.ipc.Server: IPC Server handler 20 on 55739, call
> getTask(org.apache.tez.common.ContainerContext@409a6aa9), rpc version=2,
> client version=19, methodsFingerPrint=557389974 from 10.10.30.33:33644
> Call#11094
> Retry#0: error: java.io.IOException: java.lang.OutOfMemoryError: GC
> overhead limit exceeded
>      java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit
> exceeded
>      2015-12-07 20:32:29,117 WARN [IPC Server handler 23 on 55739]
> org.apache.hadoop.ipc.Server: IPC Server handler 23 on 55739, call
> getTask(org.apache.tez.common.ContainerContext@7c7e6992), rpc version=2,
> client version=19, methodsFingerPrint=557389974 from 10.10.30.38:44218
> Call#11260
> Retry#0: error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>      java.lang.OutOfMemoryError: GC overhead limit exceeded
>      2015-12-07 20:32:53,497 INFO [Thread-60]
> amazon.emr.metrics.MetricsSaver: Saved 1:1 records to
> /mnt/var/em/raw/i-782f08c8_20151207_7921_07921_raw.bin
>      2015-12-07 20:32:53,498 INFO [Thread-61]
> amazon.emr.metrics.MetricsSaver: Saved 1:1 records to
> /mnt/var/em/raw/i-782f08c8_20151207_7921_07921_raw.bin
>      2015-12-07 20:32:53,498 INFO [Thread-2]
> org.apache.tez.dag.app.DAGAppMaster: DAGAppMaster received a signal.
> Signaling TaskScheduler
>      2015-12-07 20:32:53,498 INFO [Thread-2]
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: TaskScheduler notified
> that iSignalled was : true
>      2015-12-07 20:32:53,499 INFO [Thread-2]
> org.apache.tez.dag.history.HistoryEventHandler: Stopping
> HistoryEventHandler
>      2015-12-07 20:32:53,499 INFO [Thread-2]
> org.apache.tez.dag.history.recovery.RecoveryService: Stopping
> RecoveryService
>      2015-12-07 20:32:53,499 INFO [Thread-2]
> org.apache.tez.dag.history.recovery.RecoveryService: Closing Summary Stream
>      2015-12-07 20:32:53,499 INFO [LeaseRenewer:[email protected]:9000]
> org.apache.hadoop.util.ExitUtil: Halt with status -1 Message: HaltException
>
> Some specs about the EMR cluster -  m1.xlarge master node, 4 r3.8xlarge
> core nodes, 2 r3.8xlarge task nodes (about 1.3 TB memory)
>
> I have tried the following settings but they don't work.
>
> set tez.task.resource.memory.mb=8000;
> SET hive.tez.container.size=30208;
> SET hive.tez.java.opts=-Xmx24168m;
>
> Can anyone please help on how to fix it or point me in right direction if I
> am missing something.
>

Reply via email to