Hi all, my monitor3 is not able to rejoin the cluster (containing mon1, mon2 and mon3 - running stable emperor). I try to recreate/inject a new monmap to all 3 mon's - but only mon1 and mon2 are up and joined.
Now, enabling debugging on mon3, I got the following:
2014-01-30 08:51:03.823669 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3
handle_probe_reply mon.1 192.168.135.32:6789/0mon_probe(reply
c7b12656-15a6-41b0-963f-4f47c62497dc name ceph-mon2 quorum 0,1 paxos( fc 1 lc
160 )) v5
2014-01-30 08:51:03.823678 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 monmap
is e3: 3 mons at
{mon.ceph-mon1=192.168.135.31:6789/0,mon.ceph-mon2=192.168.135.32:6789/0,mon.ceph-mon3=192.168.135.33:6789/0}
2014-01-30 08:51:03.823701 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 peer
name is mon.ceph-mon2
2014-01-30 08:51:03.823706 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3
existing quorum 0,1
2014-01-30 08:51:03.823708 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 peer
paxos version 160 vs my version 154 (ok)
2014-01-30 08:51:03.823711 7f39b3f56700 10 mon.ceph-mon3@2(probing) e3 ready
to join, but i'm not in the monmap or my addr is blank, trying to join
But why mon3 ("but i'm not in the monmap") is not in the monmap ?
Checking the sources
https://github.com/ceph/ceph/blob/emperor/src/mon/Monitor.cc
--> if (monmap->contains(name) &&
--> !monmap->get_addr(name).is_blank_ip()) {
// i'm part of the cluster; just initiate a new election
start_election();
} else {
dout(10) << " ready to join, but i'm not in the monmap or my addr
is blank, trying to join" << dendl;
messenger->send_message(new MMonJoin(monmap->fsid, name,
messenger->get_myaddr()),
monmap->get_inst(*m->quorum.begin()));
}
My map on mon3 looks like
root@ceph-mon3:/var/log/ceph# ceph --cluster=ceph --admin-daemon
/var/run/ceph/ceph-mon.ceph-mon3.asok mon_status
{ "name": "ceph-mon3",
"rank": 2,
"state": "probing",
"election_epoch": 0,
"quorum": [],
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": { "epoch": 3,
"fsid": "c7b12656-15a6-41b0-963f-4f47c62497dc",
"modified": "2014-01-30 08:27:28.808771",
"created": "2014-01-30 08:27:28.808771",
"mons": [
{ "rank": 0,
"name": "mon.ceph-mon1",
"addr": "192.168.135.31:6789\/0"},
{ "rank": 1,
"name": "mon.ceph-mon2",
"addr": "192.168.135.32:6789\/0"},
{ "rank": 2,
"name": "mon.ceph-mon3",
"addr": "192.168.135.33:6789\/0"}]}}
So, the condition "(monmap->contains(name) &&
!monmap->get_addr(name).is_blank_ip())" should work, or ? But the
start_election() is not called.
Can somebody help me here ?
regards
Danny
More infos to mon3:
root@ceph-mon3:/var/log/ceph# hostname -a
ceph-mon3
root@ceph-mon3:/var/log/ceph# netstat -tulpen | grep ceph-mon
tcp 0 0 192.168.135.33:6789 0.0.0.0:*
LISTEN 0 635369 2164/ceph-mon
root@ceph-mon3:/var/log/ceph# cat /etc/hosts
127.0.0.1 localhost
192.168.135.33 ceph-mon3.dtnet.de ceph-mon3
admin@ceph-admin:~/cluster1$ ceph -s
cluster c7b12656-15a6-41b0-963f-4f47c62497dc
health HEALTH_WARN 192 pgs degraded; 192 pgs stale; 192 pgs stuck stale;
192 pgs stuck unclean; 1 mons down, quorum 0,1 ceph-mon1,ceph-mon2
monmap e3: 3 mons at
{ceph-mon1=192.168.135.31:6789/0,ceph-mon2=192.168.135.32:6789/0,ceph-mon3=192.168.135.33:6789/0},
election epoch 230, quorum 0,1 ceph-mon1,ceph-mon2
osdmap e28: 1 osds: 1 up, 1 in
pgmap v38: 192 pgs, 3 pools, 0 bytes data, 0 objects
36388 kB used, 3724 GB / 3724 GB avail
192 stale+active+degraded
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
