[
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834115#comment-15834115
]
Rui Li commented on HIVE-15671:
-------------------------------
Hi [~xuefuz], I tried your case but didn't reproduce your issue. Here's my
findings (w/o patch):
# I set {{hive.spark.client.server.connect.timeout}} to 10min and kill the
driver during execution of the job. (Hive CLI + yarn-cluster mode)
# Hive can detect the job failure instantly. But whether the CLI can return
instantly (blocking on {{RemoteSparkJobMonitor.startMonitor}}) depends on
whether we're in the middle of retrieving job progress from the driver. If
we're, CLI needs to wait for {{hive.spark.client.future.timeout}}, default to
1min. If not, CLI returns instantly.
> RPCServer.registerClient() erroneously uses server/client handshake timeout
> for connection timeout
> --------------------------------------------------------------------------------------------------
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Affects Versions: 1.1.0
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
> /**
> * Tells the RPC server to expect a connection from a new client.
> * ...
> */
> public Future<Rpc> registerClient(final String clientId, String secret,
> RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher,
> config.getServerConnectTimeoutMs());
> }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for
> handshake between Hive client and remote Spark driver. Instead, the timeout
> should be *hive.spark.client.connect.timeout*, which is for timeout for
> remote Spark driver in connecting back to Hive client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)