I just forced an NTP updated on all hosts to be sure it's down to clock skew. I also checked that hosts can reach all other hosts on port 6789.
I then stopped monitor 0 (60z0m02) and started monitor 1 (60zxl02), but the 3 monitors left (1 - 60zxl02, 2 - 610wl02, 4 - 615yl02) were still having problems to reach quorum. That led me to believe monitor 4 was the problem because I had a quorum before with monitors 0, 1, 2. So I stopped monitor 4 and started monitor 0 again, but this time monitors 0, 1, 2 failed to reach a quorum, which is rather puzzling. All hosts are pretty much idle all the time so I can't see why monitors would be getting stuck. On Mon, Jul 25, 2016 at 5:18 PM, Joao Eduardo Luis <[email protected]> wrote: > On 07/25/2016 04:34 PM, Sergio A. de Carvalho Jr. wrote: > >> Thanks, Joao. >> >> All monitors have the exact same mom map. >> >> I suspect you're right that there might be some communication problem >> though. I stopped monitor 1 (60zxl02), but the other 3 monitors still >> failed to reach a quorum. I could see monitor 0 was still declaring >> victory but the others were always calling for new elections: >> >> 2016-07-25 15:18:59.775144 7f8760af7700 0 log_channel(cluster) log >> [INF] : mon.60z0m02@0 won leader election with quorum 0,2,4 >> >> >> 2016-07-25 15:18:54.702176 7fc1b357d700 1 mon.610wl02@2(electing) e5 >> handle_timecheck drop unexpected msg >> 2016-07-25 15:18:54.704526 7fc1b357d700 1 >> mon.610wl02@2(electing).data_health(11626) service_dispatch not in >> quorum -- drop message >> 2016-07-25 15:19:09.792511 7fc1b3f7e700 1 >> mon.610wl02@2(peon).paxos(paxos recovering c 1318755..1319322) >> lease_timeout -- calling new election >> 2016-07-25 15:19:09.792825 7fc1b357d700 0 log_channel(cluster) log >> [INF] : mon.610wl02 calling new monitor election >> >> >> I'm curious about the "handle_timecheck drop unexpected msg" message. >> > > timechecks (i.e., checking for clock skew), as well as the data_health > service (which makes sure you have enough disk space in the mon data dir) > are only run when you have a quorum. If a message is received by a monitor > not in a quorum, regardless of state, it will be dropped. > > Assuming you know took one of the self-appointed leaders out - by shutting > it down, for instance -, you should now check what's causing elections not > to hold. > > In these cases, assuming your 3 monitors do form a quorum, the traditional > issue tends to be 'lease timeouts'. I.e., the leader fails to provide a > lease extension on paxos for the peons, and the peons assume the leader > failed in some form (unresponsive, down, whatever). > > Above it does seem a lease timeout was triggered on a peon. This may have > happened because: > > 1. leader did not extend the lease > 2. leader did extend the lease but lease was in the past - usually > indication of a clock skew on the leader, on the peons, or both. > 3. leader did extend the lease, it was with the correct time but peon > failed to dispatch the message on time. > > Both 1. and 2. may be due to several factors, but most commonly it's > because the monitor was stuck doing something. This something, more often > than not, is leveldb. If this is the case, check the size of your leveldb. > If it is over 5 or 6GB in size, you may need to manually compact the store > (mon compact on start = true, iirc). > > HTH > > -Joao > > >> >> >> On Mon, Jul 25, 2016 at 4:10 PM, Joao Eduardo Luis <[email protected] >> <mailto:[email protected]>> wrote: >> >> On 07/25/2016 03:41 PM, Sergio A. de Carvalho Jr. wrote: >> >> In the logs, there 2 monitors are constantly reporting that they >> won the >> leader election: >> >> 60z0m02 (monitor 0): >> 2016-07-25 14:31:11.644335 7f8760af7700 0 log_channel(cluster) >> log >> [INF] : mon.60z0m02@0 won leader election with quorum 0,2,4 >> 2016-07-25 14:31:44.521552 7f8760af7700 1 >> mon.60z0m02@0(leader).paxos(paxos recovering c 1318755..1319320) >> collect >> timeout, calling fresh election >> >> 60zxl02 (monitor 1): >> 2016-07-25 14:31:59.542346 7fefdeaed700 1 >> mon.60zxl02@1(electing).elector(11441) init, last seen epoch >> 11441 >> 2016-07-25 14:32:04.583929 7fefdf4ee700 0 log_channel(cluster) >> log >> [INF] : mon.60zxl02@1 won leader election with quorum 1,2,4 >> 2016-07-25 14:32:33.440103 7fefdf4ee700 1 >> mon.60zxl02@1(leader).paxos(paxos recovering c 1318755..1319319) >> collect >> timeout, calling fresh election >> >> >> >> There are two likely scenarios to explain this: >> >> 1. The monitors have different monitors in their monmaps - this >> could happen if you didn't add the new monitor via 'ceph mon add'. >> You can check this by running 'ceph daemon mon.<ID> mon_status' for >> each of the monitors in the cluster. >> >> 2. some of the monitors are unable to communicate with each other, >> thus will never acknowledge the same leader. This does not mean you >> have two leaders for the same cluster, but it does mean that you >> will end up having two monitors declaring victory and become the >> self-proclaimed leader in the cluster. The peons should still only >> belong to one quorum. >> >> If this does not help you, try setting 'debug mon = 10' and 'debug >> ms = 1' on the monitors and check the logs, making sure the monitors >> get the probes and follow the election process. If you need further >> assistance, put those logs online somewhere we can access them and >> we'll try to help you out. >> >> -Joao >> >> >> >> On Mon, Jul 25, 2016 at 3:27 PM, Sergio A. de Carvalho Jr. >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> >> wrote: >> >> Hi, >> >> I have a cluster of 5 hosts running Ceph 0.94.6 on CentOS >> 6.5. On >> each host, there is 1 monitor and 13 OSDs. We had an issue >> with the >> network and for some reason (which I still don't know why), >> the >> servers were restarted. One host is still down, but the >> monitors on >> the 4 remaining servers are failing to enter a quorum. >> >> I managed to get a quorum of 3 monitors by stopping all Ceph >> monitors and OSDs across all machines, and bringing up the >> top 3 >> ranked monitors in order of rank. After a few minutes, the >> 60z0m02 >> monitor (the top ranked one) became the leader: >> >> { >> "name": "60z0m02", >> "rank": 0, >> "state": "leader", >> "election_epoch": 11328, >> "quorum": [ >> 0, >> 1, >> 2 >> ], >> "outside_quorum": [], >> "extra_probe_peers": [], >> "sync_provider": [], >> "monmap": { >> "epoch": 5, >> "fsid": "2f51a247-3155-4bcf-9aee-c6f6b2c5e2af", >> "modified": "2016-04-28 22:26:48.604393", >> "created": "0.000000", >> "mons": [ >> { >> "rank": 0, >> "name": "60z0m02", >> "addr": "10.98.2.166:6789 >> <http://10.98.2.166:6789> <http://10.98.2.166:6789>\/0" >> }, >> { >> "rank": 1, >> "name": "60zxl02", >> "addr": "10.98.2.167:6789 >> <http://10.98.2.167:6789> <http://10.98.2.167:6789>\/0" >> }, >> { >> "rank": 2, >> "name": "610wl02", >> "addr": "10.98.2.173:6789 >> <http://10.98.2.173:6789> <http://10.98.2.173:6789>\/0" >> }, >> { >> "rank": 3, >> "name": "618yl02", >> "addr": "10.98.2.214:6789 >> <http://10.98.2.214:6789> <http://10.98.2.214:6789>\/0" >> }, >> { >> "rank": 4, >> "name": "615yl02", >> "addr": "10.98.2.216:6789 >> <http://10.98.2.216:6789> <http://10.98.2.216:6789>\/0" >> >> >> } >> ] >> } >> } >> >> The other 2 monitors became peons: >> >> "name": "60zxl02", >> "rank": 1, >> "state": "peon", >> "election_epoch": 11328, >> "quorum": [ >> 0, >> 1, >> 2 >> ], >> >> "name": "610wl02", >> "rank": 2, >> "state": "peon", >> "election_epoch": 11328, >> "quorum": [ >> 0, >> 1, >> 2 >> ], >> >> I then proceeded to start the fourth monitor, 615yl02 >> (618yl02 is >> powered off), but after more than 2 hours and several >> election >> rounds, the monitors still haven't reached a quorum. The >> monitors >> alternate mostly between "election", "probing" states but >> they often >> seem to be in different election epochs. >> >> Is this normal? >> >> Is there anything I can do to help the monitors elect a >> leader? >> Should I manually remove the dead host's monitor from the >> monitor map? >> >> I left all OSD daemons stopped while the election is going on >> purpose. Is this the best thing to do? Would bringing the >> OSDs up >> help or complicate matters even more? Or doesn't it make >> any difference? >> >> I don't see anything obviously wrong in the monitor logs. >> They're >> mostly filled with messages like the following: >> >> 2016-07-25 14:17:57.806148 7fc1b3f7e700 1 >> mon.610wl02@2(electing).elector(11411) init, last seen >> epoch 11411 >> 2016-07-25 14:17:57.829198 7fc1b7caf700 0 >> log_channel(audit) log >> [DBG] : from='admin socket' entity='admin socket' >> cmd='mon_status' >> args=[]: dispatch >> 2016-07-25 14:17:57.829200 7fc1b7caf700 0 >> log_channel(audit) do_log >> log to syslog >> 2016-07-25 14:17:57.829254 7fc1b7caf700 0 >> log_channel(audit) log >> [DBG] : from='admin socket' entity='admin socket' >> cmd=mon_status >> args=[]: finished >> >> Any help would be hugely appreciated. >> >> Thanks, >> >> Sergio >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> _______________________________________________ >> ceph-users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
