[ 
https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900627#comment-15900627
 ] 

Rui Li commented on HIVE-16071:
-------------------------------

Hi [~xuefuz], in your example, if the SASL handshake doesn't finish in time, 
the client side will exit after 1s. Even if netty can't detect the 
disconnection immediately, I don't think it takes 1h to detect it. Besides, the 
cancelTask only closes the channel, it doesn't set failure to the Future. 
Therefore we can't really rely on the cancelTask to stop the waiting. My 
proposal is:
# We need to reliably detect disconnection. I think netty is good enough for 
this (maybe with some reasonable delay). But I'm also OK to keep the cancelTask 
to close the channel ourselves.
# We need to reliably cancel the Future when disconnection is detected. This 
can be done in the SaslHandler which monitors the channel inactive event.

I also did some tests to verify. I modified the client code so that it makes 
the connection but doesn't finish SASL handshake. I tried two ways to do this, 
one is the client never sends the SaslMessage, the other is the client sends 
the SaslMessage and then just exits. The test is done in yarn-cluster mode.
# If no SaslMessage is sent, Hive will still wait for 
{{hive.spark.client.server.connect.timeout}}, even if cancelTask closes the 
channel after 1s.
# If SaslMessage is sent, SaslHandler will detect the disconnection and cancel 
the Future, no matter whether the cancelTask fires or not. Of course, this 
requires netty to detect the disconnection.

> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>
>                 Key: HIVE-16071
>                 URL: https://issues.apache.org/jira/browse/HIVE-16071
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-16071.patch
>
>
> Based on its property description in HiveConf and the comments in HIVE-12650 
> (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
>  hive.spark.client.connect.timeout is the timeout when the spark remote 
> driver makes a socket connection (channel) to RPC server. But currently it is 
> also used by the remote driver for RPC client/server handshaking, which is 
> not right. Instead, hive.spark.client.server.connect.timeout should be used 
> and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default 
> hive.spark.client.connect.timeout value (1000ms) used by remote driver for 
> handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
> Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
> Client closed before SASL negotiation finished.
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>         at 
> org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
>         at 
> org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> Caused by: javax.security.sasl.SaslException: Client closed before SASL 
> negotiation finished.
>         at 
> org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
>         at 
> org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to