[
https://issues.apache.org/jira/browse/HIVE-16071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894672#comment-15894672
]
Chaoyu Tang commented on HIVE-16071:
------------------------------------
Yes, [~lirui]. Increasing hive.spark.client.server.connect.timeout (instead of
the hive.spark.client.connect.timeout) could help in my case.
The cancelTask could effect and close the channel only when its timeout is set
to a value shorter than current hive.spark.client.server.connect.timeout. So
for this cancelTask, we can do:
1. remove it to make code more understandable; or
2. leave it as is since it is not be executed anyway; or
3. Use a different HoS timeout configuration (either
hive.spark.client.connect.timeout or a new one) so that we have more and finer
control to the waiting time at HS2 side. Adding a new timeout config may not be
desirable since we already have many such configurations.
[~xuefuz], [~lirui], [~vanzin], what do you think?
> Spark remote driver misuses the timeout in RPC handshake
> --------------------------------------------------------
>
> Key: HIVE-16071
> URL: https://issues.apache.org/jira/browse/HIVE-16071
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Chaoyu Tang
> Assignee: Chaoyu Tang
> Attachments: HIVE-16071.patch
>
>
> Based on its property description in HiveConf and the comments in HIVE-12650
> (https://issues.apache.org/jira/browse/HIVE-12650?focusedCommentId=15128979&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15128979),
> hive.spark.client.connect.timeout is the timeout when the spark remote
> driver makes a socket connection (channel) to RPC server. But currently it is
> also used by the remote driver for RPC client/server handshaking, which is
> not right. Instead, hive.spark.client.server.connect.timeout should be used
> and it has already been used by the RPCServer in the handshaking.
> The error like following is usually caused by this issue, since the default
> hive.spark.client.connect.timeout value (1000ms) used by remote driver for
> handshaking is a little too short.
> {code}
> 17/02/20 08:46:08 ERROR yarn.ApplicationMaster: User class threw exception:
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException:
> Client closed before SASL negotiation finished.
> java.util.concurrent.ExecutionException: javax.security.sasl.SaslException:
> Client closed before SASL negotiation finished.
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> at
> org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
> at
> org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
> Caused by: javax.security.sasl.SaslException: Client closed before SASL
> negotiation finished.
> at
> org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:453)
> at
> org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)