[ 
https://issues.apache.org/jira/browse/TEZ-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3468:
-----------------------------
    Description: 
Tez AM throws a Dispatcher error during shutdown when running LLAP queries. 

{code:java}
2016-10-13 09:14:24,999 [INFO] [LlapScheduler] 
|tezplugins.LlapTaskSchedulerService|: ScheduleResult for Task: 
TaskInfo{task=attempt_1476340082683_0013_137_01_000019_0, priority=14, 
startTime=0, containerId=null, assignedInstance=null, uniqueId=7505, 
localityDelayTimeout=9223372036854775807} = DELAYED_RESOURCES
2016-10-13 09:14:24,999 [INFO] [LlapScheduler] 
|tezplugins.LlapTaskSchedulerService|: Preempting for 
task=attempt_1476340082683_0013_137_01_000019_0 on any available host
2016-10-13 09:14:24,999 [INFO] [LlapScheduler] 
|tezplugins.LlapTaskSchedulerService|: No tasks qualify as killable to schedule 
tasks at priority 14
2016-10-13 09:14:24,999 [INFO] [Socket Reader #1 for port 36800] |ipc.Server|: 
Auth successful for application_1476340082683_0013 (auth:SIMPLE)
2016-10-13 09:14:24,999 [ERROR] [Dispatcher thread {Central}] 
|common.AsyncDispatcher|: Error in dispatcher thread
org.apache.tez.dag.api.TezUncheckedException: Cannot get ApplicationACLs before 
all services have started
        at 
org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getApplicationACLs(DAGAppMaster.java:1711)
        at 
org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:491)
        at 
org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:448)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:419)
        at 
org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:76)
        at 
org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:60)
        at 
org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:36)
        at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
2016-10-13 09:14:24,999 [WARN] [Dispatcher thread {Central}] 
|container.AMContainerImpl|: Unexpected TA Assignment: TAId: 
attempt_1476340082683_0013_137_01_000017_3  for ContainerId: 
container_222212222_0013_01_007508 while in state: ALLOCATED
{code}


  was:
Discovered this issue while running some ACID queries concurrently over on 10 
threads. TezTask failed with the below error:
{code:java}
2016-10-13 09:14:24,999 [INFO] [LlapScheduler] 
|tezplugins.LlapTaskSchedulerService|: ScheduleResult for Task: 
TaskInfo{task=attempt_1476340082683_0013_137_01_000019_0, priority=14, 
startTime=0, containerId=null, assignedInstance=null, uniqueId=7505, 
localityDelayTimeout=9223372036854775807} = DELAYED_RESOURCES
2016-10-13 09:14:24,999 [INFO] [LlapScheduler] 
|tezplugins.LlapTaskSchedulerService|: Preempting for 
task=attempt_1476340082683_0013_137_01_000019_0 on any available host
2016-10-13 09:14:24,999 [INFO] [LlapScheduler] 
|tezplugins.LlapTaskSchedulerService|: No tasks qualify as killable to schedule 
tasks at priority 14
2016-10-13 09:14:24,999 [INFO] [Socket Reader #1 for port 36800] |ipc.Server|: 
Auth successful for application_1476340082683_0013 (auth:SIMPLE)
2016-10-13 09:14:24,999 [ERROR] [Dispatcher thread {Central}] 
|common.AsyncDispatcher|: Error in dispatcher thread
org.apache.tez.dag.api.TezUncheckedException: Cannot get ApplicationACLs before 
all services have started
        at 
org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getApplicationACLs(DAGAppMaster.java:1711)
        at 
org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:491)
        at 
org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:448)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at 
org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:419)
        at 
org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:76)
        at 
org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:60)
        at 
org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:36)
        at 
org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
        at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
        at java.lang.Thread.run(Thread.java:745)
2016-10-13 09:14:24,999 [WARN] [Dispatcher thread {Central}] 
|container.AMContainerImpl|: Unexpected TA Assignment: TAId: 
attempt_1476340082683_0013_137_01_000017_3  for ContainerId: 
container_222212222_0013_01_007508 while in state: ALLOCATED
{code}
{code}
2016-10-13 09:14:24,929 [INFO] [pool-1-thread-1] |app.DAGAppMaster|: 
DAGAppMasterShutdownHook invoked
2016-10-13 09:14:24,929 [INFO] [pool-1-thread-1] |app.DAGAppMaster|: 
DAGAppMaster received a signal. Signaling TaskScheduler
2016-10-13 09:14:24,931 [INFO] [pool-1-thread-1] |rm.TaskSchedulerManager|: 
TaskScheduler notified that iSignalled was : true
{code}

{code}
2016-10-13 09:14:25,089 [WARN] [AMRM Heartbeater thread] 
|retry.RetryInvocationHandler|: Exception while invoking 
ApplicationMasterProtocolPBClientImpl.allocate over null. Not retrying because 
try once and fail.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: 
Application attempt appattempt_1476340082683_0013_000
001 doesn't exist in ApplicationMasterService cache.
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:445)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBSer
viceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationM
asterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
at com.sun.proxy.$Proxy42.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)
at 
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:237)
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException):
 Application attempt appattempt_1476340082683_0013_000001 doesn't exist in 
ApplicationMasterService cache.
at 
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:445)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at 
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy41.allocate(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 9 more
2016-10-13 09:14:25,090 [INFO] [AMRM Heartbeater thread] 
|impl.AMRMClientAsyncImpl|: Shutdown requested. Stopping callback.
{code}

{code}
Application Report :
        Application-Id : application_1476340082683_0013
        Application-Name : HIVE-66493e0b-0ccd-4878-abe5-405d87d02b2d
        Application-Type : TEZ
        User : hive
        Queue : default
        Application Priority : null
        Start-Time : 1476343184157
        Finish-Time : 1476350064973
        Progress : 100%
        State : FAILED
        Final-State : FAILED
        Tracking-URL : 
http://node:8088/cluster/app/application_1476340082683_0013
        RPC Port : -1
        AM Host : N/A
        Aggregate Resource Allocation : 7045512 MB-seconds, 6880 vcore-seconds
        Log Aggregation Status : SUCCEEDED
        Diagnostics : Application application_1476340082683_0013 failed 1 times 
(global limit =2; local limit is =1) due to AM Container for 
appattempt_1476340082683_0013_000001 exited with  exitCode: -104
For more detailed output, check the application tracking page: 
http://node:8088/cluster/app/application_1476340082683_0013 Then click on links 
to logs of each attempt.
Diagnostics: Container 
[pid=19619,containerID=container_1476340082683_0013_01_000001] is running 
beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory 
used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1476340082683_0013_01_000001 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
        |- 19633 19619 19619 19619 (java) 32742 3597 2764886016 263532 
/usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx819m 
-Djava.io.tmpdir=/mnt/resource/hadoop/yarn/local/usercache/hive/appcache/application_1476340082683_0013/container_1476340082683_0013_01_000001/tmp
 -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.2.0-37 
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
-XX:+UseParallelGC 
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
-Dlog4j.configuration=tez-container-log4j.properties 
-Dyarn.app.container.log.dir=/mnt/resource/hadoop/yarn/log/application_1476340082683_0013/container_1476340082683_0013_01_000001
 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel= 
org.apache.tez.dag.app.DAGAppMaster --session
        |- 19619 19617 19619 19619 (bash) 0 0 11575296 753 /bin/bash -c 
/usr/lib/jvm/java-8-openjdk-amd64/bin/java  -Xmx819m 
-Djava.io.tmpdir=/mnt/resource/hadoop/yarn/local/usercache/hive/appcache/application_1476340082683_0013/container_1476340082683_0013_01_000001/tmp
 -server -Djava.net.preferIPv4Stack=true -Dhdp.version=2.5.2.0-37 
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
-XX:+UseParallelGC 
-Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator 
-Dlog4j.configuration=tez-container-log4j.properties 
-Dyarn.app.container.log.dir=/mnt/resource/hadoop/yarn/log/application_1476340082683_0013/container_1476340082683_0013_01_000001
 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' 
org.apache.tez.dag.app.DAGAppMaster --session 
1>/mnt/resource/hadoop/yarn/log/application_1476340082683_0013/container_1476340082683_0013_01_000001/stdout
 
2>/mnt/resource/hadoop/yarn/log/application_1476340082683_0013/container_1476340082683_0013_01_000001/stderr

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Failing this attempt. Failing the application.
        Unmanaged Application : false
        Application Node Label Expression : <Not set>
        AM container Node Label Expression : <DEFAULT_PARTITION>
{code}

[~hitesh] mentioned the AM was killed as it went over mem limits.  


> TezUncheckedException: Cannot get ApplicationACLs before all services have 
> started
> ----------------------------------------------------------------------------------
>
>                 Key: TEZ-3468
>                 URL: https://issues.apache.org/jira/browse/TEZ-3468
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Kavan Suresh
>
> Tez AM throws a Dispatcher error during shutdown when running LLAP queries. 
> {code:java}
> 2016-10-13 09:14:24,999 [INFO] [LlapScheduler] 
> |tezplugins.LlapTaskSchedulerService|: ScheduleResult for Task: 
> TaskInfo{task=attempt_1476340082683_0013_137_01_000019_0, priority=14, 
> startTime=0, containerId=null, assignedInstance=null, uniqueId=7505, 
> localityDelayTimeout=9223372036854775807} = DELAYED_RESOURCES
> 2016-10-13 09:14:24,999 [INFO] [LlapScheduler] 
> |tezplugins.LlapTaskSchedulerService|: Preempting for 
> task=attempt_1476340082683_0013_137_01_000019_0 on any available host
> 2016-10-13 09:14:24,999 [INFO] [LlapScheduler] 
> |tezplugins.LlapTaskSchedulerService|: No tasks qualify as killable to 
> schedule tasks at priority 14
> 2016-10-13 09:14:24,999 [INFO] [Socket Reader #1 for port 36800] 
> |ipc.Server|: Auth successful for application_1476340082683_0013 (auth:SIMPLE)
> 2016-10-13 09:14:24,999 [ERROR] [Dispatcher thread {Central}] 
> |common.AsyncDispatcher|: Error in dispatcher thread
> org.apache.tez.dag.api.TezUncheckedException: Cannot get ApplicationACLs 
> before all services have started
>         at 
> org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getApplicationACLs(DAGAppMaster.java:1711)
>         at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:491)
>         at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:448)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:419)
>         at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:76)
>         at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:60)
>         at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:36)
>         at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>         at 
> org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
>         at java.lang.Thread.run(Thread.java:745)
> 2016-10-13 09:14:24,999 [WARN] [Dispatcher thread {Central}] 
> |container.AMContainerImpl|: Unexpected TA Assignment: TAId: 
> attempt_1476340082683_0013_137_01_000017_3  for ContainerId: 
> container_222212222_0013_01_007508 while in state: ALLOCATED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to