[
https://issues.apache.org/jira/browse/TEZ-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700383#comment-14700383
]
Hitesh Shah commented on TEZ-2728:
----------------------------------
Could you attach the full stack trace/log?
This looks like the ipc call eventually timed out. I am not sure whether we can
safely assume that the session is not running if the RM is down but could later
come back up and recover the yarn application.
Instead should Hive consider treating any exception from submitDAG as an excuse
to try killing the session and re-trying with a new one?
> Wrap IPC connection Exception as SessionNotRunning - RM crash
> -------------------------------------------------------------
>
> Key: TEZ-2728
> URL: https://issues.apache.org/jira/browse/TEZ-2728
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0, 0.5.4, 0.6.2, 0.8.0
> Reporter: Gopal V
> Assignee: Hitesh Shah
>
> Crashing the RM when a query session is open and restarting it does not
> result in a recoverable state for a Hive session.
> {code}
> 2015-08-17T22:34:21,981 INFO [main]: ipc.Client
> (Client.java:handleConnectionFailure(885)) - Retrying connect to server:
> cn042-10.sandbox.hortonworks.com/172.19.128.42:10200. Already tried 48
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50,
> sleepTime=1000 MILLISECONDS)
> 2015-08-17T22:34:22,982 INFO [main]: ipc.Client
> (Client.java:handleConnectionFailure(885)) - Retrying connect to server:
> cn042-10.sandbox.hortonworks.com/172.19.128.42:10200. Already tried 49
> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=50,
> sleepTime=1000 MILLISECONDS)
> 2015-08-17T22:34:22,987 ERROR [main]: exec.Task (TezTask.java:execute(195)) -
> Failed to execute tez graph.
> java.net.ConnectException: Call From
> cn041.sandbox.hortonworks.com/172.19.128.41 to
> cn042.sandbox.hortonworks.com:10200 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method) ~[?:1.8.0_51]
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> ~[?:1.8.0_51]
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> ~[?:1.8.0_51]
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> ~[?:1.8.0_51]
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
> ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
> ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
> at org.apache.hadoop.ipc.Client.call(Client.java:1444)
> ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
> at org.apache.hadoop.ipc.Client.call(Client.java:1371)
> ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> ~[hadoop-common-2.8.0-20150722.003145-873.jar:?]
> at com.sun.proxy.$Proxy41.getApplicationReport(Unknown Source) ~[?:?]
> at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationHistoryProtocolPBClientImpl.getApplicationReport(ApplicationHistoryProtocolPBClientImpl.java:108)
> ~[hadoop-yarn-common-2.8.0-20150721.221214-843.jar:?]
> at
> org.apache.hadoop.yarn.client.api.impl.AHSClientImpl.getApplicationReport(AHSClientImpl.java:101)
> ~[hadoop-yarn-client-2.8.0-20150721.221233-841.jar:?]
> at
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:442)
> ~[hadoop-yarn-client-2.8.0-20150721.221233-841.
> jar:?]
> at
> org.apache.tez.client.TezYarnClient.getApplicationReport(TezYarnClient.java:89)
> ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> at
> org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:835)
> ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> at org.apache.tez.client.TezClient.getAMProxy(TezClient.java:713)
> ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> at org.apache.tez.client.TezClient.waitForProxy(TezClient.java:723)
> ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> at
> org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:453)
> ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> at org.apache.tez.client.TezClient.submitDAG(TezClient.java:391)
> ~[tez-api-0.8.0-SNAPSHOT.jar:0.8.0-SNAPSHOT]
> at
> org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:409)
> ~[hive-exec-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)