[ 
https://issues.apache.org/jira/browse/MESOS-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig W updated MESOS-2329:
---------------------------
    Description: 
In a test environment I have experienced an issue where the Mesos Master 
process crashes after its ZooKeeper session expires. The last few messages in 
the INFO log file look like this:

{noformat}
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:313] Group process (group(4)@192.168.1.4:5050) reconnected to 
ZooKeeper
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:790] Syncing group operations: queue size (joins, cancels datas) = 
(0, 0, 0)
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:472] ZooKeeper session expired
detector.cpp:138] Detected a new leader: None
master.cpp:1263] The newly elected leader is None
{noformat}
. I had a single node ZooKeeper ensemble.
In my environment, I had a single master, 7 slaves and a single ZooKeeper 
instance. 

Restarting the mater process "fixes" the issue.

  was:
In a test environment I have experience an issue where the Mesos Master process 
crashes after its ZooKeeper session expires. The last messages in the INFO log 
file look like this:

{noformat}
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:313] Group process (group(4)@192.168.4.42:5050) reconnected to 
ZooKeeper
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:790] Syncing group operations: queue size (joins, cancels datas) = 
(0, 0, 0)
group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
group.cpp:472] ZooKeeper session expired
detector.cpp:138] Detected a new leader: None
master.cpp:1263] The newly elected leader is None
{noformat}

In my environment, I had a single master and 3 slaves. I had a single node 
ZooKeeper ensemble. 

Restarting the mater process "fixes" the issue.


> Mesos master crashes after ZooKeeper session expires
> ----------------------------------------------------
>
>                 Key: MESOS-2329
>                 URL: https://issues.apache.org/jira/browse/MESOS-2329
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.21.1
>         Environment: CentOS 6.5 (kernel 2.6.32-431), Java 1.7.0_55, ZooKeeper 
> 3.4.5
>            Reporter: Craig W
>
> In a test environment I have experienced an issue where the Mesos Master 
> process crashes after its ZooKeeper session expires. The last few messages in 
> the INFO log file look like this:
> {noformat}
> group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
> group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
> group.cpp:313] Group process (group(4)@192.168.1.4:5050) reconnected to 
> ZooKeeper
> group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
> group.cpp:790] Syncing group operations: queue size (joins, cancels datas) = 
> (0, 0, 0)
> group.cpp:418] Lost connection to ZooKeeper, attempting to reconnect ...
> group.cpp:472] ZooKeeper session expired
> detector.cpp:138] Detected a new leader: None
> master.cpp:1263] The newly elected leader is None
> {noformat}
> . I had a single node ZooKeeper ensemble.
> In my environment, I had a single master, 7 slaves and a single ZooKeeper 
> instance. 
> Restarting the mater process "fixes" the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to