Hi Greg 




I have seen this problem before in my cluster. 




    * What ceph version you are running 
    * Did you made any change recently in the cluster , that resulted in this 
problem 





You identified correct , the only problem is ceph-mon-2003 is listening to 
incorrect port , it should listen on port 6789 ( like the other two monitors ) 
. How i resolved is cleanly removing the infected monitor node and adding it 
back to cluster. 




Regards 

Karan 
----- Original Message -----

From: "Greg Poirier" <[email protected]> 
To: [email protected] 
Sent: Tuesday, 4 February, 2014 10:50:21 PM 
Subject: [ceph-users] Ceph MON can no longer join quorum 

I have a MON that at some point lost connectivity to the rest of the cluster 
and now cannot rejoin. 

Each time I restart it, it looks like it's attempting to create a new MON and 
join the cluster, but the rest of the cluster rejects it, because the new one 
isn't in the monmap. 

I don't know why it suddenly decided it needed to be a new MON. 

I am not really sure where to start. 

root@ceph-mon-2003:/var/log/ceph# ceph -s 
cluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8 
health HEALTH_ERR 1 pgs inconsistent; 2 pgs peering; 126 pgs stale; 2 pgs stuck 
inactive; 126 pgs stuck stale; 2 pgs stuck unclean; 10 requests are blocked > 
32 sec; 1 scrub errors; 1 mons down, quorum 0,1 ceph-mon-2001,ceph-mon-2002 
monmap e2: 3 mons at {ceph-mon-2001= 
10.30.66.13:6789/0,ceph-mon-2002=10.30.66.14:6789/0,ceph-mon-2003=10.30.66.15:6800/0
 }, election epoch 12964, quorum 0,1 ceph-mon-2001,ceph-mon-2002 

Notice ceph-mon-2003:6800 

If I try to start ceph-mon-all, it will be listening on some other port... 

root@ceph-mon-2003:/var/log/ceph# start ceph-mon-all 
ceph-mon-all start/running 
root@ceph-mon-2003:/var/log/ceph# ps -ef | grep ceph 
root 6930 1 31 15:49 ? 00:00:00 /usr/bin/ceph-mon --cluster=ceph -i 
ceph-mon-2003 -f 
root 6931 1 3 15:49 ? 00:00:00 python /usr/sbin/ceph-create-keys --cluster=ceph 
-i ceph-mon-2003 

root@ceph-mon-2003:/var/log/ceph# ceph -s 
2014-02-04 15:49:56.854866 7f9cf422d700 0 -- :/1007028 >> 10.30.66.15:6789/0 
pipe(0x7f9cf0021370 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f9cf00215d0).fault 
cluster 4167d5f2-2b9e-4bde-a653-f24af68a45f8 
health HEALTH_ERR 1 pgs inconsistent; 2 pgs peering; 126 pgs stale; 2 pgs stuck 
inactive; 126 pgs stuck stale; 2 pgs stuck unclean; 10 requests are blocked > 
32 sec; 1 scrub errors; 1 mons down, quorum 0,1 ceph-mon-2001,ceph-mon-2002 
monmap e2: 3 mons at {ceph-mon-2001= 
10.30.66.13:6789/0,ceph-mon-2002=10.30.66.14:6789/0,ceph-mon-2003=10.30.66.15:6800/0
 }, election epoch 12964, quorum 0,1 ceph-mon-2001,ceph-mon-2002 

Suggestions? 

_______________________________________________ 
ceph-users mailing list 
[email protected] 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to