[
https://issues.apache.org/jira/browse/HIVE-20506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604809#comment-16604809
]
Sahil Takiar commented on HIVE-20506:
-------------------------------------
[~brocknoland] I think you might be hitting the
{{SPARK_RPC_CLIENT_HANDSHAKE_TIMEOUT}} timeout. The
{{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} is the timeout for how long SASL
negotiation takes between the {{RemoteDriver}} and HS2 (yes I know its a bit
confusing).
{{SPARK_RPC_CLIENT_HANDSHAKE_TIMEOUT}} is set to 90 seconds by default. So HoS
will essentially wait 90 seconds for the Spark application to be submitted. The
app has to be submit and accepted by YARN, and the {{RemoteDriver}} has to
startup and connect back to HS2 all within 90 seconds. Essentially, if the
cluster is busy, HoS will wait 90 seconds for the cluster to free up enough
resources for the Spark app the start before issuing a timeout.
Is my understanding of your problem correct?
I agree we should make the HoS behavior as close to the HoMR behavior as
possible. I'm not entirely sure what HoMR does. Is there a timeout for the
MapReduce application to be accepted?
> HOS times out when cluster is full while Hive-on-MR waits
> ---------------------------------------------------------
>
> Key: HIVE-20506
> URL: https://issues.apache.org/jira/browse/HIVE-20506
> Project: Hive
> Issue Type: Improvement
> Reporter: Brock Noland
> Priority: Major
>
> My understanding is as follows:
> Hive-on-MR when the cluster is full will wait for resources to be available
> before submitting a job. This is because the hadoop jar command is the
> primary mechanism Hive uses to know if a job is complete or failed.
>
> Hive-on-Spark will timeout after {{SPARK_RPC_CLIENT_CONNECT_TIMEOUT}} because
> the RPC client in the AppMaster doesn't connect back to the RPC Server in
> HS2.
> This is a behavior difference it'd be great to close.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)