On 06/03/2014 09:19 AM, Steffen Thorhauer wrote:
Hi,
I'm at the process of upgrading my ceph cluster from emperor to firefly.
After upgrading my 3 mons  there is one out of quorum.

ceph health detail
HEALTH_WARN 1 mons down, quorum 0,2 u124-11,u124-13
mon.u124-12 (rank 1) addr 10.37.124.12:6789/0 is down (out of quorum)

I have tons of following log entries in the ceph-mon.u124-12.log
2014-06-03 09:04:50.648461 7f5879635700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x45d9900 sd=9 :46071 s=2 pgs=2259869 cs=182253 l=0 c=0x2f81760).fault, initiating reconnect 2014-06-03 09:04:50.648903 7f587772a700 10 mon.u124-12@1(electing) e2 ms_get_authorizer for mon

Any idea ?
I found some more different lines in ceph-mon.u124-12.log

2014-06-03 10:03:26.026831 7f6436e71700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39338 s=2 pgs=2394287 cs=54115 l=0 c=0x340b4a0).fault, initiating reconnect 2014-06-03 10:03:26.026910 7f6437073700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39338 s=1 pgs=2394287 cs=54116 l=0 c=0x340b4a0).fault 2014-06-03 10:03:26.027314 7f6437073700 10 mon.u124-12@1(electing) e2 ms_get_authorizer for mon 2014-06-03 10:03:26.027558 7f6434910700 10 mon.u124-12@1(electing) e2 join_election
2014-06-03 10:03:26.027570 7f6434910700 10 mon.u124-12@1(electing) e2 _reset
2014-06-03 10:03:26.027575 7f6434910700 10 mon.u124-12@1(electing) e2 cancel_probe_timeout (none scheduled) 2014-06-03 10:03:26.027579 7f6434910700 10 mon.u124-12@1(electing) e2 timecheck_finish 2014-06-03 10:03:26.027598 7f6434910700 10 mon.u124-12@1(electing) e2 scrub_reset 2014-06-03 10:03:26.027615 7f6434910700 10 mon.u124-12@1(electing) e2 start_election
2014-06-03 10:03:26.027619 7f6434910700 10 mon.u124-12@1(electing) e2 _reset
2014-06-03 10:03:26.027623 7f6434910700 10 mon.u124-12@1(electing) e2 cancel_probe_timeout (none scheduled) 2014-06-03 10:03:26.027625 7f6434910700 10 mon.u124-12@1(electing) e2 timecheck_finish 2014-06-03 10:03:26.027628 7f6434910700 10 mon.u124-12@1(electing) e2 scrub_reset 2014-06-03 10:03:26.027631 7f6434910700 10 mon.u124-12@1(electing) e2 cancel_probe_timeout (none scheduled) 2014-06-03 10:03:26.027646 7f6434910700 0 log [INF] : mon.u124-12 calling new monitor election 2014-06-03 10:03:26.027728 7f6434910700 5 mon.u124-12@1(electing).elector(1499) start -- can i be leader? 2014-06-03 10:03:26.027820 7f6434910700 1 mon.u124-12@1(electing).elector(1499) init, last seen epoch 1499 2014-06-03 10:03:26.027988 7f6434910700 20 mon.u124-12@1(electing) e2 have connection 2014-06-03 10:03:26.027993 7f6434910700 20 mon.u124-12@1(electing) e2 ms_dispatch existing session MonSession: mon.2 10.37.124.13:6789/0 is openallow * for mon.2 10.37.124.13:6789/0 2014-06-03 10:03:26.028008 7f6434910700 20 mon.u124-12@1(electing) e2 caps allow * 2014-06-03 10:03:26.028012 7f6434910700 20 is_capable service=mon command= exec on cap allow * 2014-06-03 10:03:26.028017 7f6434910700 20 allow so far , doing grant allow *
2014-06-03 10:03:26.028021 7f6434910700 20  allow all
2014-06-03 10:03:26.028056 7f6434910700 5 mon.u124-12@1(electing).elector(1499) handle_ack from mon.2 2014-06-03 10:03:26.028063 7f6434910700 5 mon.u124-12@1(electing).elector(1499) so far i have {1=8796093022207,2=68719476735} 2014-06-03 10:03:26.028093 7f6434910700 20 mon.u124-12@1(electing) e2 have connection 2014-06-03 10:03:26.028097 7f6434910700 20 mon.u124-12@1(electing) e2 ms_dispatch existing session MonSession: mon.1 10.37.124.12:6789/0 is openallow * for mon.1 10.37.124.12:6789/0 2014-06-03 10:03:26.028108 7f6434910700 20 mon.u124-12@1(electing) e2 caps allow * 2014-06-03 10:03:26.028114 7f6434910700 1 mon.u124-12@1(electing).paxos(paxos recovering c 27937488..27938103) is_readable now=2014-06-03 10:03:26.028115 lease_expire=0.000000 has v0 lc 27938103 2014-06-03 10:03:26.028130 7f6434910700 1 mon.u124-12@1(electing).paxos(paxos recovering c 27937488..27938103) is_readable now=2014-06-03 10:03:26.028132 lease_expire=0.000000 has v0 lc 27938103 2014-06-03 10:03:26.028580 7f6436e71700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39339 s=2 pgs=2394288 cs=54117 l=0 c=0x340b4a0).fault, initiating reconnect 2014-06-03 10:03:26.028677 7f6437073700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39339 s=1 pgs=2394288 cs=54118 l=0 c=0x340b4a0).fault 2014-06-03 10:03:26.029003 7f6434910700 20 mon.u124-12@1(electing) e2 have connection 2014-06-03 10:03:26.029011 7f6434910700 20 mon.u124-12@1(electing) e2 ms_dispatch existing session MonSession: mon.2 10.37.124.13:6789/0 is openallow * for mon.2 10.37.124.13:6789/0 2014-06-03 10:03:26.029028 7f6434910700 20 mon.u124-12@1(electing) e2 caps allow * 2014-06-03 10:03:26.029034 7f6434910700 20 is_capable service=mon command= exec on cap allow * 2014-06-03 10:03:26.029041 7f6434910700 20 allow so far , doing grant allow *
2014-06-03 10:03:26.029044 7f6434910700 20  allow all
2014-06-03 10:03:26.029069 7f6434910700 5 mon.u124-12@1(electing).elector(1499) handle_ack from mon.2 2014-06-03 10:03:26.029076 7f6434910700 5 mon.u124-12@1(electing).elector(1499) so far i have {1=8796093022207,2=68719476735} 2014-06-03 10:03:26.029136 7f6437073700 10 mon.u124-12@1(electing) e2 ms_get_authorizer for mon 2014-06-03 10:03:26.030439 7f6436e71700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39340 s=2 pgs=2394289 cs=54119 l=0 c=0x340b4a0).fault, initiating reconnect 2014-06-03 10:03:26.030523 7f6437073700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39340 s=1 pgs=2394289 cs=54120 l=0 c=0x340b4a0).fault 2014-06-03 10:03:26.030933 7f6437073700 10 mon.u124-12@1(electing) e2 ms_get_authorizer for mon 2014-06-03 10:03:26.032189 7f6436e71700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39341 s=2 pgs=2394290 cs=54121 l=0 c=0x340b4a0).fault, initiating reconnect 2014-06-03 10:03:26.032269 7f6437073700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39341 s=1 pgs=2394290 cs=54122 l=0 c=0x340b4a0).fault 2014-06-03 10:03:26.032732 7f6437073700 10 mon.u124-12@1(electing) e2 ms_get_authorizer for mon 2014-06-03 10:03:26.033306 7f6432f0b700 10 mon.u124-12@1(electing) e2 ms_verify_authorizer 10.37.124.11:6789/0 mon protocol 2 2014-06-03 10:03:26.033536 7f6432f0b700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422800 sd=13 :6789 s=0 pgs=0 cs=0 l=0 c=0x340d700).accept connect_seq 54122 vs existing 54122 state wait 2014-06-03 10:03:26.033609 7f6434910700 10 mon.u124-12@1(electing) e2 ms_handle_reset 0x340d700 10.37.124.11:6789/0 2014-06-03 10:03:26.034352 7f6434910700 20 mon.u124-12@1(electing) e2 have connection 2014-06-03 10:03:26.034356 7f6434910700 20 mon.u124-12@1(electing) e2 ms_dispatch existing session MonSession: mon.0 10.37.124.11:6789/0 is openallow * for mon.0 10.37.124.11:6789/0 2014-06-03 10:03:26.034368 7f6434910700 20 mon.u124-12@1(electing) e2 caps allow * 2014-06-03 10:03:26.034372 7f6434910700 20 is_capable service=mon command= exec on cap allow * 2014-06-03 10:03:26.034375 7f6434910700 20 allow so far , doing grant allow *
2014-06-03 10:03:26.034378 7f6434910700 20  allow all
2014-06-03 10:03:26.034393 7f6434910700 5 mon.u124-12@1(electing).elector(1499) handle_propose from mon.0 2014-06-03 10:03:26.034398 7f6434910700 5 mon.u124-12@1(electing).elector(1499) defer to 0 2014-06-03 10:03:26.034599 7f6432f0b700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422800 sd=13 :6789 s=2 pgs=2394291 cs=54123 l=0 c=0x340b4a0).fault, initiating reconnect 2014-06-03 10:03:26.034703 7f6436e71700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422800 sd=13 :6789 s=1 pgs=2394291 cs=54124 l=0 c=0x340b4a0).fault 2014-06-03 10:03:26.035171 7f6436e71700 10 mon.u124-12@1(electing) e2 ms_get_authorizer for mon 2014-06-03 10:03:26.035351 7f6437073700 10 mon.u124-12@1(electing) e2 ms_verify_authorizer 10.37.124.11:6789/0 mon protocol 2 2014-06-03 10:03:26.035566 7f6437073700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :6789 s=0 pgs=0 cs=0 l=0 c=0x340d860).accept connect_seq 54124 vs existing 54124 state connecting 2014-06-03 10:03:26.035660 7f6434910700 10 mon.u124-12@1(electing) e2 ms_handle_reset 0x340d860 10.37.124.11:6789/0 2014-06-03 10:03:26.036539 7f6437073700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :6789 s=2 pgs=2394292 cs=54125 l=0 c=0x340b4a0).fault, initiating reconnect 2014-06-03 10:03:26.036627 7f6432f0b700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :6789 s=1 pgs=2394292 cs=54126 l=0 c=0x340b4a0).fault 2014-06-03 10:03:26.037063 7f6432f0b700 10 mon.u124-12@1(electing) e2 ms_get_authorizer for mon 2014-06-03 10:03:26.038382 7f6437073700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :39344 s=2 pgs=2394293 cs=54127 l=0 c=0x340b4a0).fault, initiating reconnect 2014-06-03 10:03:26.038832 7f6432f0b700 10 mon.u124-12@1(electing) e2 ms_get_authorizer for mon 2014-06-03 10:03:26.040188 7f6437073700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :39345 s=2 pgs=2394294 cs=54129 l=0 c=0x340b4a0).fault, initiating reconnect 2014-06-03 10:03:26.040275 7f6432f0b700 0 -- 10.37.124.12:6789/0 >> 10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :39345 s=1 pgs=2394294 cs=54130 l=0 c=0x340b4a0).fault 2014-06-03 10:03:26.040665 7f6432f0b700 10 mon.u124-12@1(electing) e2 ms_get_authorizer for mon


--
______________________________________________________________________
Steffen Thorhauer

Department of Technical and Business Information Systems (ITI)
Faculty of Computer Science (FIN)
  Otto von Guericke University Magdeburg
Universitaetsplatz 2
39106 Magdeburg, Germany

phone: 0391 67 52996
fax: 0391 67 12341
email: [email protected]
url: http://wwwiti.cs.uni-magdeburg.de/~thorhaue

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to