[jira] [Commented] (HIVE-16071) Spark remote driver misuses the timeout in RPC handshake

Rui Li (JIRA) Tue, 07 Mar 2017 21:23:13 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900728#comment-15900728
 ]


Rui Li commented on HIVE-16071:
-------------------------------

Hi [~xuefuz], let me summarise my point: here we're talking about two issues - 
detecting disconnection and react to the disconnection. I think the root cause 
of your example is we don't react properly (i.e. we don't fail the future) on 
disconnection.
Regarding detecting the disconnection, I suppose we can rely on netty. The 
cancelTask is kind of a further insurance in case netty fails (or takes too 
long) to detect it.
bq. let cancelTask fail the Future so that Hive stops waiting
Like I mentioned in my proposal, I think SaslHandler is in a better place to do 
this. SaslHandler is intended for the SASL handshake, and it removes itself 
from the pipeline once the handshake finishes. Therefore, if SaslHandler 
detects disconnection, it means the channel is closed before the handshake 
finishes. And thus we should fail the Future. Do you think it makes sense to 
open another JIRA for this?

> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>
>                 Key: HIVE-16071
>                 URL: https://issues.apache.org/jira/browse/HIVE-16071
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>         Attachments: HIVE-16071.patch
>
>
> Based on its property description in HiveConf and the comments in HIVE-12650 
> (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
>  hive.spark.client.connect.timeout is the timeout when the spark remote 
> driver makes a socket connection (channel) to RPC server. But currently it is 
> also used by the remote driver for RPC client/server handshaking, which is 
> not right. Instead, hive.spark.client.server.connect.timeout should be used 
> and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default 
> hive.spark.client.connect.timeout value (1000ms) used by remote driver for 
> handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
> Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: 
> Client closed before SASL negotiation finished.
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>         at 
> org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
>         at 
> org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> Caused by: javax.security.sasl.SaslException: Client closed before SASL 
> negotiation finished.
>         at 
> org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
>         at 
> org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16071) Spark remote driver misuses the timeout in RPC handshake

Reply via email to