[
https://issues.apache.org/jira/browse/HIVE-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15211335#comment-15211335
]
Rui Li commented on HIVE-12650:
-------------------------------
I think the difficult part is that we really don't know the possible reasons.
Anyway all we get is a timeout, it could be due to network issue, exceptions,
or the RSC is just busy.
Another possible refinement is that we can make the behavior more consistent.
Like I said, there're now 2 paths that can lead to timeout/failure and user
will see different error messages. How about remove the timeout at
{{RemoteHiveSparkClient#createRemoteClient#getExecutorCount}}? I mean after
certain amount of time, we can give up the pre-warm and eventually fail the job
at job monitor.
> Spark-submit is killed when Hive times out. Killing spark-submit doesn't
> cancel AM request. When AM is finally launched, it tries to connect back to
> Hive and gets refused.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-12650
> URL: https://issues.apache.org/jira/browse/HIVE-12650
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.1.1, 1.2.1
> Reporter: JoneZhang
> Assignee: Xuefu Zhang
>
> I think hive.spark.client.server.connect.timeout should be set greater than
> spark.yarn.am.waitTime. The default value for
> spark.yarn.am.waitTime is 100s, and the default value for
> hive.spark.client.server.connect.timeout is 90s, which is not good. We can
> increase it to a larger value such as 120s.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)