[ 
https://issues.apache.org/jira/browse/SPARK-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458888#comment-16458888
 ] 

Ashwin Agate edited comment on SPARK-15544 at 4/30/18 7:05 PM:
---------------------------------------------------------------

Can we please increase the priority of this bug since it exists in latest Spark 
2.3.0 too?  We have observed this during upgrade scenario (with Spark 1.6.3), 
where we have to shutdown zookeeper, which has the adverse side-effect of spark 
master shutting down on other nodes which is not very ideal.

 Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:40588 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: 
spark-box1:7078 got disassociated, removing it.
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Removing 
worker worker-20180427105900-spark-box1-7078 on 19
Apr 27 12:57:30 spark-box java[26869]: 18/04/27 12:57:30 INFO Master: Telling 
app of lost executor: 2
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO ClientCnxn: 
Unable to read additional data from server sessionid 0x1630
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ConnectionStateManager: State change: SUSPENDED
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 INFO 
ZooKeeperLeaderElectionAgent: We have lost leadership
Apr 27 12:59:20 spark-box java[26869]: 18/04/27 12:59:20 ERROR Master: 
Leadership has been revoked -- master shutting down.
{code:java}
 {code}


was (Author: agateaaa):
Can we please increase the priority of this bug since it exists in latest Spark 
2.3.0 too?  We have observed this during upgrade scenario (with Spark 1.6.3), 
where we have to shutdown zookeeper, which has the adverse side-effect of spark 
master shutting down on other nodes which is not very ideal.

 

> Bouncing Zookeeper node causes Active spark master to exit
> ----------------------------------------------------------
>
>                 Key: SPARK-15544
>                 URL: https://issues.apache.org/jira/browse/SPARK-15544
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 14.04.  Zookeeper 3.4.6 with 3-node quorum
>            Reporter: Steven Lowenthal
>            Priority: Major
>
> Shutting Down a single zookeeper node caused spark master to exit.  The 
> master should have connected to a second zookeeper node. 
> {code:title=log output}
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/1 on worker worker-20160524013212-10.16.28.76-59138
> 16/05/25 18:21:28 INFO master.Master: Launching executor 
> app-20160525182128-0006/2 on worker worker-20160524013204-10.16.21.217-47129
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x154dfc0426b0054, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO zookeeper.ClientCnxn: Unable to read additional data 
> from server sessionid 0x254c701f28d0053, likely server has closed socket, 
> closing socket connection and attempting reconnect
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO state.ConnectionStateManager: State change: SUSPENDED
> 16/05/26 00:16:01 INFO master.ZooKeeperLeaderElectionAgent: We have lost 
> leadership
> 16/05/26 00:16:01 ERROR master.Master: Leadership has been revoked -- master 
> shutting down. }}
> {code}
> spark-env.sh: 
> {code:title=spark-env.sh}
> export SPARK_LOCAL_DIRS=/ephemeral/spark/local
> export SPARK_WORKER_DIR=/ephemeral/spark/work
> export SPARK_LOG_DIR=/var/log/spark
> export HADOOP_CONF_DIR=/home/ubuntu/hadoop-2.6.3/etc/hadoop
> export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER 
> -Dspark.deploy.zookeeper.url=gn5456-zookeeper-01:2181,gn5456-zookeeper-02:2181,gn5456-zookeeper-03:2181"
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to