[
https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903422#comment-15903422
]
Chaoyu Tang edited comment on HIVE-16071 at 3/9/17 5:24 PM:
------------------------------------------------------------
So we reached the consensus that hive.spark.client.server.connect.timeout
should not be used for cancelTask at RPCServer side. The value proposed could
be hive.spark.client.connect.timeout.
[~xuefuz] The reason that I previously suggested we could consider another
timeout for cancelTask (a little longer than
hive.spark.client.connect.timeout.) is to give RemoteDriver a little more time
to timeout the handshaking than RPCServer. If the timeout at both sides are set
to exactly same value, we might see the situations quite often where the
terminations of SASL handshaking are initiated by cancelTask at RpcServer side
because the timeout at RemoteDriver side might be slightly later for whatever
reasons. During this short window, the handshake could still have a chance to
succeed if it is not terminated by cancelTask.
To my understanding, to shorten cancelTask timeout is mainly for RpcServer to
detect the handshake timeout (fired by RemoteDriver) sooner, we still want
RemoteDriver to mainly control the SASL handshake timeout, and most handshake
timeout should be fired from remoteDriver, right?
In addition, I think we should
was (Author: ctang.ma):
So we reached the consensus that hive.spark.client.server.connect.timeout
should not be used for cancelTask at RPCServer side. The value proposed could
be hive.spark.client.connect.timeout.
[~xuefuz] The reason that I previously suggested we could consider another
timeout for cancelTask (a little longer than
hive.spark.client.connect.timeout.) is to give RemoteDriver a little more time
to timeout the handshaking than RPCServer. If the timeout at both sides are set
to exactly same value, we might see the situations quite often where the
terminations of SASL handshaking are initiated by cancelTask at RpcServer side
for the timeout at RemoteDriver side might be slightly later for whatever
reasons. During this short window, the handshake could still succeed if it is
not terminated by cancelTask.
To my understanding, we still want RemoteDriver to mainly control the SASL
handshake timeout, to shorten the cancelTask timeout is mainly for RpcServer to
detect the timeout (fired by RemoteDriver) sooner, right?
> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>
> Key: HIVE-16071
> URL: https://issues.apache.org/jira/browse/HIVE-16071
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Chaoyu Tang
> Assignee: Chaoyu Tang
> Attachments: HIVE-16071.patch
>
>
> Based on its property description in HiveConf and the comments in HIVE-12650
> (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
> hive.spark.client.connect.timeout is the timeout when the spark remote
> driver makes a socket connection (channel) to RPC server. But currently it is
> also used by the remote driver for RPC client/server handshaking, which is
> not right. Instead, hive.spark.client.server.connect.timeout should be used
> and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default
> hive.spark.client.connect.timeout value (1000ms) used by remote driver for
> handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception:
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException:
> Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException:
> Client closed before SASL negotiation finished.
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> at
> org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
> at
> org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> Caused by: javax.security.sasl.SaslException: Client closed before SASL
> negotiation finished.
> at
> org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
> at
> org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)