[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832159#comment-15832159
 ] 

Marcelo Vanzin commented on HIVE-15671:
---------------------------------------

[~xuefuz] I see what you mean, but I think your analysis is slightly off.

1 and 2 are actually where the problem, if any, is; 2 should use 
{{getConnectTimeoutMs()}} instead of the server version. As Rui said, the 
"server timeout" here, which is actually the "authentication timeout", needs to 
be much longer than the client timeout since it involves the time to start the 
driver.

So basically: all calls made on the client side (= Spark driver) should use 
{{getConnectTimeoutMs()}}, all calls made on the server side (= HS2) should use 
{{getServerConnectTimeoutMs()}} (although, if I remember the code correct, the 
one timeout set up in {{registerClient()}} ends up taking precedence over all 
others on the server path).

> doing that has a bad consequence that Hive will wait as long to declare a 
> failure if for any reason the remote driver becomes dead

That's kinda hard to solve, because the server doesn't know which client 
connected until two things happen: first the driver has started, second the 
driver completed the SASL handshake to identify itself. A lot of things can go 
wrong in that time. There's already some code, IIRC, that fails the session if 
the spark-submit job dies with an error, but aside from that, it's kinda hard 
to do more.


> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15671
>                 URL: https://issues.apache.org/jira/browse/HIVE-15671
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-15671.patch
>
>
> {code}
>   /**
>    * Tells the RPC server to expect a connection from a new client.
>    * ...
>    */
>   public Future<Rpc> registerClient(final String clientId, String secret,
>       RpcDispatcher serverDispatcher) {
>     return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to