[ https://issues.apache.org/jira/browse/HIVE-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211335#comment-15211335 ]
Rui Li commented on HIVE-12650: ------------------------------- I think the difficult part is that we really don't know the possible reasons. Anyway all we get is a timeout, it could be due to network issue, exceptions, or the RSC is just busy. Another possible refinement is that we can make the behavior more consistent. Like I said, there're now 2 paths that can lead to timeout/failure and user will see different error messages. How about remove the timeout at {{RemoteHiveSparkClient#createRemoteClient#getExecutorCount}}? I mean after certain amount of time, we can give up the pre-warm and eventually fail the job at job monitor. > Spark-submit is killed when Hive times out. Killing spark-submit doesn't > cancel AM request. When AM is finally launched, it tries to connect back to > Hive and gets refused. > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-12650 > URL: https://issues.apache.org/jira/browse/HIVE-12650 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.1, 1.2.1 > Reporter: JoneZhang > Assignee: Xuefu Zhang > > I think hive.spark.client.server.connect.timeout should be set greater than > spark.yarn.am.waitTime. The default value for > spark.yarn.am.waitTime is 100s, and the default value for > hive.spark.client.server.connect.timeout is 90s, which is not good. We can > increase it to a larger value such as 120s. -- This message was sent by Atlassian JIRA (v6.3.4#6332)