[jira] [Comment Edited] (HIVE-16071) Spark remote driver misuses the timeout in RPC handshake

Chaoyu Tang (JIRA) Thu, 09 Mar 2017 09:26:04 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903422#comment-15903422
 ]


Chaoyu Tang edited comment on HIVE-16071 at 3/9/17 5:24 PM:
------------------------------------------------------------

So we reached the consensus that hive.spark.client.server.connect.timeout 
should not be used for cancelTask at RPCServer side. The value proposed could 
be hive.spark.client.connect.timeout.
[~xuefuz] The reason that I previously suggested we could consider another 
timeout for cancelTask (a little longer than 
hive.spark.client.connect.timeout.) is to give RemoteDriver a little more time 
to timeout the handshaking than RPCServer. If the timeout at both sides are set 
to exactly same value, we might see the situations quite often where the 
terminations of SASL handshaking are initiated by cancelTask at RpcServer side 
because the timeout at RemoteDriver side might be slightly later for whatever 
reasons. During this short window, the handshake could still have a chance to 
succeed if it is not terminated by cancelTask.
To my understanding, to shorten cancelTask timeout is mainly for RpcServer to 
detect the handshake timeout (fired by RemoteDriver) sooner, we still want 
RemoteDriver to mainly control the SASL handshake timeout, and most handshake 
timeout should be fired from remoteDriver, right?
In addition, I think we should


was (Author: ctang.ma):
So we reached the consensus that hive.spark.client.server.connect.timeout 
should not be used for cancelTask at RPCServer side. The value proposed could 
be hive.spark.client.connect.timeout.
[~xuefuz] The reason that I previously suggested we could consider another 
timeout for cancelTask (a little longer than 
hive.spark.client.connect.timeout.) is to give RemoteDriver a little more time 
to timeout the handshaking than RPCServer. If the timeout at both sides are set 
to exactly same value, we might see the situations quite often where the 
terminations of SASL handshaking are initiated by cancelTask at RpcServer side 
for the timeout at RemoteDriver side might be slightly later for whatever 
reasons. During this short window, the handshake could still succeed if it is 
not terminated by cancelTask.
To my understanding, we still want RemoteDriver to mainly control the SASL 
handshake timeout, to shorten the cancelTask timeout is mainly for RpcServer to 
detect the timeout (fired by RemoteDriver) sooner, right? 

> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>
>                 Key: HIVE-16071
>                 URL: https://issues.apache.org/jira/browse/HIVE-16071
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-16071.patch
>
>
> Based on its property description in HiveConf and the comments in HIVE-12650 
> (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
>  hive.spark.client.connect.timeout is the timeout when the spark remote 
> driver makes a socket connection (channel) to RPC server. But currently it is 
> also used by the remote driver for RPC client/server handshaking, which is 
> not right. Instead, hive.spark.client.server.connect.timeout should be used 
> and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default 
> hive.spark.client.connect.timeout value (1000ms) used by remote driver for 
> handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
> Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
> Client closed before SASL negotiation finished.
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>         at 
> org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
>         at 
> org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> Caused by: javax.security.sasl.SaslException: Client closed before SASL 
> negotiation finished.
>         at 
> org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
>         at 
> org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-16071) Spark remote driver misuses the timeout in RPC handshake

Reply via email to