[
https://issues.apache.org/jira/browse/SPARK-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659548#comment-14659548
]
zengqiuyang commented on SPARK-9629:
------------------------------------
ZK runing well,
The verified thing is if i change the zk server system time exceed the time
rang of session negotiated timeount, session shutdown.
My zk server have time fluctuation before ,and now is checking in stabilize
time. Waiting for Exception apper;
Another thing .Two session allways lost same time.
I wish it RECONNECTED first , but not change to SUSPENDED and lost leadership
immediatelyï¼›
the log in ZK
2015-08-03 20:18:59,828 [myid:3] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of
stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x34ee39684b70002, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2015-08-03 20:18:59,829 [myid:3] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket
connection for client /192.168.0.146:37829 which had sessionid 0x34ee39684b70002
2015-08-03 20:19:00,252 [myid:3] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of
stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x34ee39684b70001, likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2015-08-03 20:19:00,253 [myid:3] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket
connection for client /192.168.0.146:37828 which had sessionid 0x34ee39684b70001
> Client session timed out, have not heard from server in
> --------------------------------------------------------
>
> Key: SPARK-9629
> URL: https://issues.apache.org/jira/browse/SPARK-9629
> Project: Spark
> Issue Type: Bug
> Components: Deploy
> Affects Versions: 1.4.0, 1.4.1
> Environment: spark1.4.1 ./make-distribution.sh --tgz
> -Dhadoop.version=2.5.2 -Dyarn.version=2.5.2 -Phive -Phive-thriftserver
> -Pyarn
> zookeeper-3.4.6.tar.gz
> standalone HA
> Linux version 2.6.32-358.el6.x86_64 ([email protected]) (gcc
> version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri Feb 22 00:31:26
> UTC 2013
> Reporter: zengqiuyang
> Priority: Critical
>
> the spark HA running every few days , then " Client session timed out"
> appear。
> show reconnect but not do it, and master shutting down.
> logs:
> 15/08/05 05:32:57 INFO zookeeper.ClientCnxn: Client session timed out, have
> not heard from server in 37753ms for sessionid 0x34ee39684b70005, closing
> socket connection and attempting reconnect
> 15/08/05 05:32:57 INFO state.ConnectionStateManager: State change: SUSPENDED
> 15/08/05 05:32:57 WARN state.ConnectionStateManager: There are no
> ConnectionStateListeners registered.
> 15/08/05 05:32:57 INFO zookeeper.ClientCnxn: Opening socket connection to
> server h5/192.168.0.18:2181. Will not attempt to authenticate using SASL
> (unknown error)
> 15/08/05 05:32:57 INFO zookeeper.ClientCnxn: Socket connection established to
> h5/192.168.0.18:2181, initiating session
> 15/08/05 05:32:57 INFO zookeeper.ClientCnxn: Session establishment complete
> on server h5/192.168.0.18:2181, sessionid = 0x34ee39684b70005, negotiated
> timeout = 40000
> 15/08/05 05:32:57 INFO state.ConnectionStateManager: State change: RECONNECTED
> 15/08/05 05:32:57 WARN state.ConnectionStateManager: There are no
> ConnectionStateListeners registered.
> 15/08/05 05:32:58 INFO zookeeper.ClientCnxn: Client session timed out, have
> not heard from server in 37753ms for sessionid 0x34ee39684b70006, closing
> socket connection and attempting reconnect
> 15/08/05 05:32:58 INFO state.ConnectionStateManager: State change: SUSPENDED
> 15/08/05 05:32:58 INFO master.ZooKeeperLeaderElectionAgent: We have lost
> leadership
> 15/08/05 05:32:58 ERROR master.Master: Leadership has been revoked -- master
> shutting down.
> 15/08/05 05:32:58 INFO util.Utils: Shutdown hook called
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]