[ 
https://issues.apache.org/jira/browse/HIVE-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211335#comment-15211335
 ] 

Rui Li commented on HIVE-12650:
-------------------------------

I think the difficult part is that we really don't know the possible reasons. 
Anyway all we get is a timeout, it could be due to network issue, exceptions, 
or the RSC is just busy.

Another possible refinement is that we can make the behavior more consistent. 
Like I said, there're now 2 paths that can lead to timeout/failure and user 
will see different error messages. How about remove the timeout at 
{{RemoteHiveSparkClient#createRemoteClient#getExecutorCount}}? I mean after 
certain amount of time, we can give up the pre-warm and eventually fail the job 
at job monitor.

> Spark-submit is killed when Hive times out. Killing spark-submit doesn't 
> cancel AM request. When AM is finally launched, it tries to connect back to 
> Hive and gets refused.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-12650
>                 URL: https://issues.apache.org/jira/browse/HIVE-12650
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.1, 1.2.1
>            Reporter: JoneZhang
>            Assignee: Xuefu Zhang
>
> I think hive.spark.client.server.connect.timeout should be set greater than 
> spark.yarn.am.waitTime. The default value for 
> spark.yarn.am.waitTime is 100s, and the default value for 
> hive.spark.client.server.connect.timeout is 90s, which is not good. We can 
> increase it to a larger value such as 120s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to