[
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831452#comment-15831452
]
Rui Li commented on HIVE-15671:
-------------------------------
Hi [~xuefuz], I tried your patch locally. {{hive.spark.client.connect.timeout}}
defaults to 1000ms. But starting the RemoteDriver can easily take longer than
that. Therefore my job just failed with "Timed out waiting for client
connection".
I'm quite ignorant about the Rpc code. What I see is we have two timeout
configs with different default value. And the specific code here needs the one
with the bigger default value. I'd really appreciate it if [~vanzin] could give
more detailed explanations about the purposes of the two configs, and whether
they're used inconsistently as Xuefu pointed out.
Besides, the naming is really confusing to me, like SparkClient is the
RpcServer, and we pass ClientProtocol to serverDispatcher etc. I understand the
client/server concepts are probably reversed for HS2/RemoteDriver and Rpc.
Wondering if it's better to make it somehow consistent.
> RPCServer.registerClient() erroneously uses server/client handshake timeout
> for connection timeout
> --------------------------------------------------------------------------------------------------
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Affects Versions: 1.1.0
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Attachments: HIVE-15671.patch
>
>
> {code}
> /**
> * Tells the RPC server to expect a connection from a new client.
> * ...
> */
> public Future<Rpc> registerClient(final String clientId, String secret,
> RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher,
> config.getServerConnectTimeoutMs());
> }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for
> handshake between Hive client and remote Spark driver. Instead, the timeout
> should be *hive.spark.client.connect.timeout*, which is for timeout for
> remote Spark driver in connecting back to Hive client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)