[
https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900728#comment-15900728
]
Rui Li commented on HIVE-16071:
-------------------------------
Hi [~xuefuz], let me summarise my point: here we're talking about two issues -
detecting disconnection and react to the disconnection. I think the root cause
of your example is we don't react properly (i.e. we don't fail the future) on
disconnection.
Regarding detecting the disconnection, I suppose we can rely on netty. The
cancelTask is kind of a further insurance in case netty fails (or takes too
long) to detect it.
bq. let cancelTask fail the Future so that Hive stops waiting
Like I mentioned in my proposal, I think SaslHandler is in a better place to do
this. SaslHandler is intended for the SASL handshake, and it removes itself
from the pipeline once the handshake finishes. Therefore, if SaslHandler
detects disconnection, it means the channel is closed before the handshake
finishes. And thus we should fail the Future. Do you think it makes sense to
open another JIRA for this?
> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>
> Key: HIVE-16071
> URL: https://issues.apache.org/jira/browse/HIVE-16071
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Chaoyu Tang
> Assignee: Chaoyu Tang
> Attachments: HIVE-16071.patch
>
>
> Based on its property description in HiveConf and the comments in HIVE-12650
> (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
> hive.spark.client.connect.timeout is the timeout when the spark remote
> driver makes a socket connection (channel) to RPC server. But currently it is
> also used by the remote driver for RPC client/server handshaking, which is
> not right. Instead, hive.spark.client.server.connect.timeout should be used
> and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default
> hive.spark.client.connect.timeout value (1000ms) used by remote driver for
> handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception:
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException:
> Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException:
> Client closed before SASL negotiation finished.
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> at
> org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
> at
> org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> Caused by: javax.security.sasl.SaslException: Client closed before SASL
> negotiation finished.
> at
> org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
> at
> org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)