[
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832159#comment-15832159
]
Marcelo Vanzin commented on HIVE-15671:
---------------------------------------
[~xuefuz] I see what you mean, but I think your analysis is slightly off.
1 and 2 are actually where the problem, if any, is; 2 should use
{{getConnectTimeoutMs()}} instead of the server version. As Rui said, the
"server timeout" here, which is actually the "authentication timeout", needs to
be much longer than the client timeout since it involves the time to start the
driver.
So basically: all calls made on the client side (= Spark driver) should use
{{getConnectTimeoutMs()}}, all calls made on the server side (= HS2) should use
{{getServerConnectTimeoutMs()}} (although, if I remember the code correct, the
one timeout set up in {{registerClient()}} ends up taking precedence over all
others on the server path).
> doing that has a bad consequence that Hive will wait as long to declare a
> failure if for any reason the remote driver becomes dead
That's kinda hard to solve, because the server doesn't know which client
connected until two things happen: first the driver has started, second the
driver completed the SASL handshake to identify itself. A lot of things can go
wrong in that time. There's already some code, IIRC, that fails the session if
the spark-submit job dies with an error, but aside from that, it's kinda hard
to do more.
> RPCServer.registerClient() erroneously uses server/client handshake timeout
> for connection timeout
> --------------------------------------------------------------------------------------------------
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Affects Versions: 1.1.0
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Attachments: HIVE-15671.patch
>
>
> {code}
> /**
> * Tells the RPC server to expect a connection from a new client.
> * ...
> */
> public Future<Rpc> registerClient(final String clientId, String secret,
> RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher,
> config.getServerConnectTimeoutMs());
> }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for
> handshake between Hive client and remote Spark driver. Instead, the timeout
> should be *hive.spark.client.connect.timeout*, which is for timeout for
> remote Spark driver in connecting back to Hive client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)