I just forced an NTP updated on all hosts to be sure it's down to clock
skew. I also checked that hosts can reach all other hosts on port 6789.

I then stopped monitor 0 (60z0m02) and started monitor 1 (60zxl02), but the
3 monitors left (1 - 60zxl02, 2 - 610wl02, 4 - 615yl02) were still having
problems to reach quorum. That led me to believe monitor 4 was the problem
because I had a quorum before with monitors 0, 1, 2.

So I stopped monitor 4 and started monitor 0 again, but this time monitors
0, 1, 2 failed to reach a quorum, which is rather puzzling.

All hosts are pretty much idle all the time so I can't see why monitors
would be getting stuck.



On Mon, Jul 25, 2016 at 5:18 PM, Joao Eduardo Luis <[email protected]> wrote:

> On 07/25/2016 04:34 PM, Sergio A. de Carvalho Jr. wrote:
>
>> Thanks, Joao.
>>
>> All monitors have the exact same mom map.
>>
>> I suspect you're right that there might be some communication problem
>> though. I stopped monitor 1 (60zxl02), but the other 3 monitors still
>> failed to reach a quorum. I could see monitor 0 was still declaring
>> victory but the others were always calling for new elections:
>>
>> 2016-07-25 15:18:59.775144 7f8760af7700  0 log_channel(cluster) log
>> [INF] : mon.60z0m02@0 won leader election with quorum 0,2,4
>>
>>
>> 2016-07-25 15:18:54.702176 7fc1b357d700  1 mon.610wl02@2(electing) e5
>> handle_timecheck drop unexpected msg
>> 2016-07-25 15:18:54.704526 7fc1b357d700  1
>> mon.610wl02@2(electing).data_health(11626) service_dispatch not in
>> quorum -- drop message
>> 2016-07-25 15:19:09.792511 7fc1b3f7e700  1
>> mon.610wl02@2(peon).paxos(paxos recovering c 1318755..1319322)
>> lease_timeout -- calling new election
>> 2016-07-25 15:19:09.792825 7fc1b357d700  0 log_channel(cluster) log
>> [INF] : mon.610wl02 calling new monitor election
>>
>>
>> I'm curious about the "handle_timecheck drop unexpected msg" message.
>>
>
> timechecks (i.e., checking for clock skew), as well as the data_health
> service (which makes sure you have enough disk space in the mon data dir)
> are only run when you have a quorum. If a message is received by a monitor
> not in a quorum, regardless of state, it will be dropped.
>
> Assuming you know took one of the self-appointed leaders out - by shutting
> it down, for instance -, you should now check what's causing elections not
> to hold.
>
> In these cases, assuming your 3 monitors do form a quorum, the traditional
> issue tends to be 'lease timeouts'. I.e., the leader fails to provide a
> lease extension on paxos for the peons, and the peons assume the leader
> failed in some form (unresponsive, down, whatever).
>
> Above it does seem a lease timeout was triggered on a peon. This may have
> happened because:
>
> 1. leader did not extend the lease
> 2. leader did extend the lease but lease was in the past - usually
> indication of a clock skew on the leader, on the peons, or both.
> 3. leader did extend the lease, it was with the correct time but peon
> failed to dispatch the message on time.
>
> Both 1. and 2. may be due to several factors, but most commonly it's
> because the monitor was stuck doing something. This something, more often
> than not, is leveldb. If this is the case, check the size of your leveldb.
> If it is over 5 or 6GB in size, you may need to manually compact the store
> (mon compact on start = true, iirc).
>
> HTH
>
>   -Joao
>
>
>>
>>
>> On Mon, Jul 25, 2016 at 4:10 PM, Joao Eduardo Luis <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     On 07/25/2016 03:41 PM, Sergio A. de Carvalho Jr. wrote:
>>
>>         In the logs, there 2 monitors are constantly reporting that they
>>         won the
>>         leader election:
>>
>>         60z0m02 (monitor 0):
>>         2016-07-25 14:31:11.644335 7f8760af7700  0 log_channel(cluster)
>> log
>>         [INF] : mon.60z0m02@0 won leader election with quorum 0,2,4
>>         2016-07-25 14:31:44.521552 7f8760af7700  1
>>         mon.60z0m02@0(leader).paxos(paxos recovering c 1318755..1319320)
>>         collect
>>         timeout, calling fresh election
>>
>>         60zxl02 (monitor 1):
>>         2016-07-25 14:31:59.542346 7fefdeaed700  1
>>         mon.60zxl02@1(electing).elector(11441) init, last seen epoch
>> 11441
>>         2016-07-25 14:32:04.583929 7fefdf4ee700  0 log_channel(cluster)
>> log
>>         [INF] : mon.60zxl02@1 won leader election with quorum 1,2,4
>>         2016-07-25 14:32:33.440103 7fefdf4ee700  1
>>         mon.60zxl02@1(leader).paxos(paxos recovering c 1318755..1319319)
>>         collect
>>         timeout, calling fresh election
>>
>>
>>
>>     There are two likely scenarios to explain this:
>>
>>     1. The monitors have different monitors in their monmaps - this
>>     could happen if you didn't add the new monitor via 'ceph mon add'.
>>     You can check this by running 'ceph daemon mon.<ID> mon_status' for
>>     each of the monitors in the cluster.
>>
>>     2. some of the monitors are unable to communicate with each other,
>>     thus will never acknowledge the same leader. This does not mean you
>>     have two leaders for the same cluster, but it does mean that you
>>     will end up having two monitors declaring victory and become the
>>     self-proclaimed leader in the cluster. The peons should still only
>>     belong to one quorum.
>>
>>     If this does not help you, try setting 'debug mon = 10' and 'debug
>>     ms = 1' on the monitors and check the logs, making sure the monitors
>>     get the probes and follow the election process. If you need further
>>     assistance, put those logs online somewhere we can access them and
>>     we'll try to help you out.
>>
>>        -Joao
>>
>>
>>
>>         On Mon, Jul 25, 2016 at 3:27 PM, Sergio A. de Carvalho Jr.
>>         <[email protected] <mailto:[email protected]>
>>         <mailto:[email protected] <mailto:[email protected]>>>
>>
>>         wrote:
>>
>>              Hi,
>>
>>              I have a cluster of 5 hosts running Ceph 0.94.6 on CentOS
>>         6.5. On
>>              each host, there is 1 monitor and 13 OSDs. We had an issue
>>         with the
>>              network and for some reason (which I still don't know why),
>> the
>>              servers were restarted. One host is still down, but the
>>         monitors on
>>              the 4 remaining servers are failing to enter a quorum.
>>
>>              I managed to get a quorum of 3 monitors by stopping all Ceph
>>              monitors and OSDs across all machines, and bringing up the
>>         top 3
>>              ranked monitors in order of rank. After a few minutes, the
>>         60z0m02
>>              monitor (the top ranked one) became the leader:
>>
>>              {
>>                   "name": "60z0m02",
>>                   "rank": 0,
>>                   "state": "leader",
>>                   "election_epoch": 11328,
>>                   "quorum": [
>>                       0,
>>                       1,
>>                       2
>>                   ],
>>                   "outside_quorum": [],
>>                   "extra_probe_peers": [],
>>                   "sync_provider": [],
>>                   "monmap": {
>>                       "epoch": 5,
>>                       "fsid": "2f51a247-3155-4bcf-9aee-c6f6b2c5e2af",
>>                       "modified": "2016-04-28 22:26:48.604393",
>>                       "created": "0.000000",
>>                       "mons": [
>>                           {
>>                               "rank": 0,
>>                               "name": "60z0m02",
>>                               "addr": "10.98.2.166:6789
>>         <http://10.98.2.166:6789> <http://10.98.2.166:6789>\/0"
>>                           },
>>                           {
>>                               "rank": 1,
>>                               "name": "60zxl02",
>>                               "addr": "10.98.2.167:6789
>>         <http://10.98.2.167:6789> <http://10.98.2.167:6789>\/0"
>>                           },
>>                           {
>>                               "rank": 2,
>>                               "name": "610wl02",
>>                               "addr": "10.98.2.173:6789
>>         <http://10.98.2.173:6789> <http://10.98.2.173:6789>\/0"
>>                           },
>>                           {
>>                               "rank": 3,
>>                               "name": "618yl02",
>>                               "addr": "10.98.2.214:6789
>>         <http://10.98.2.214:6789> <http://10.98.2.214:6789>\/0"
>>                           },
>>                           {
>>                               "rank": 4,
>>                               "name": "615yl02",
>>                               "addr": "10.98.2.216:6789
>>         <http://10.98.2.216:6789> <http://10.98.2.216:6789>\/0"
>>
>>
>>                           }
>>                       ]
>>                   }
>>              }
>>
>>              The other 2 monitors became peons:
>>
>>              "name": "60zxl02",
>>                   "rank": 1,
>>                   "state": "peon",
>>                   "election_epoch": 11328,
>>                   "quorum": [
>>                       0,
>>                       1,
>>                       2
>>                   ],
>>
>>              "name": "610wl02",
>>                   "rank": 2,
>>                   "state": "peon",
>>                   "election_epoch": 11328,
>>                   "quorum": [
>>                       0,
>>                       1,
>>                       2
>>                   ],
>>
>>              I then proceeded to start the fourth monitor, 615yl02
>>         (618yl02 is
>>              powered off), but after more than 2 hours and several
>> election
>>              rounds, the monitors still haven't reached a quorum. The
>>         monitors
>>              alternate mostly between "election", "probing" states but
>>         they often
>>              seem to be in different election epochs.
>>
>>              Is this normal?
>>
>>              Is there anything I can do to help the monitors elect a
>> leader?
>>              Should I manually remove the dead host's monitor from the
>>         monitor map?
>>
>>              I left all OSD daemons stopped while the election is going on
>>              purpose. Is this the best thing to do? Would bringing the
>>         OSDs up
>>              help or complicate matters even more? Or doesn't it make
>>         any difference?
>>
>>              I don't see anything obviously wrong in the monitor logs.
>>         They're
>>              mostly filled with messages like the following:
>>
>>              2016-07-25 14:17:57.806148 7fc1b3f7e700  1
>>              mon.610wl02@2(electing).elector(11411) init, last seen
>>         epoch 11411
>>              2016-07-25 14:17:57.829198 7fc1b7caf700  0
>>         log_channel(audit) log
>>              [DBG] : from='admin socket' entity='admin socket'
>>         cmd='mon_status'
>>              args=[]: dispatch
>>              2016-07-25 14:17:57.829200 7fc1b7caf700  0
>>         log_channel(audit) do_log
>>              log to syslog
>>              2016-07-25 14:17:57.829254 7fc1b7caf700  0
>>         log_channel(audit) log
>>              [DBG] : from='admin socket' entity='admin socket'
>>         cmd=mon_status
>>              args=[]: finished
>>
>>              Any help would be hugely appreciated.
>>
>>              Thanks,
>>
>>              Sergio
>>
>>
>>
>>
>>         _______________________________________________
>>         ceph-users mailing list
>>         [email protected] <mailto:[email protected]>
>>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>     _______________________________________________
>>     ceph-users mailing list
>>     [email protected] <mailto:[email protected]>
>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to