[
https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604793#comment-16604793
]
Brock Noland commented on HIVE-20506:
-------------------------------------
I believe we could fix this by the following method:
# In {{SparkSubmitSparkClient}} capture the YARN application id and pass to
{{RpcServer}}
# {{RpcServer}} or rather {{SaslServerHandler}} inside that class keeps
extending the timeout while the YARN application is an {{ACCEPTED}} state.
I believe this will cause the HS2 to wait for the the yarn job to actually
start before counting down the timeout. However, the timeout will still apply
if the application starts by fails to connect back to the HS2.
> HOS times out when cluster is full while Hive-on-MR waits
> ---------------------------------------------------------
>
> Key: HIVE-20506
> URL: https://issues.apache.org/jira/browse/HIVE-20506
> Project: Hive
> Issue Type: Improvement
> Reporter: Brock Noland
> Priority: Major
>
> My understanding is as follows:
> Hive-on-MR when the cluster is full will wait for resources to be available
> before submitting a job. This is because the hadoop jar command is the
> primary mechanism Hive uses to know if a job is complete.
>
> Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because
> the RPC client in the AppMaster doesn't connect back to the RPC Server in
> HS2.
> This is a behavior difference it'd be great to close.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)