On 06/03/2014 09:19 AM, Steffen Thorhauer wrote:
Hi,
I'm at the process of upgrading my ceph cluster from emperor to firefly.
After upgrading my 3 mons there is one out of quorum.
ceph health detail
HEALTH_WARN 1 mons down, quorum 0,2 u124-11,u124-13
mon.u124-12 (rank 1) addr 10.37.124.12:6789/0 is down (out of quorum)
I have tons of following log entries in the ceph-mon.u124-12.log
2014-06-03 09:04:50.648461 7f5879635700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x45d9900 sd=9 :46071 s=2 pgs=2259869
cs=182253 l=0 c=0x2f81760).fault, initiating reconnect
2014-06-03 09:04:50.648903 7f587772a700 10 mon.u124-12@1(electing) e2
ms_get_authorizer for mon
Any idea ?
I found some more different lines in ceph-mon.u124-12.log
2014-06-03 10:03:26.026831 7f6436e71700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39338 s=2 pgs=2394287 cs=54115
l=0 c=0x340b4a0).fault, initiating reconnect
2014-06-03 10:03:26.026910 7f6437073700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39338 s=1 pgs=2394287 cs=54116
l=0 c=0x340b4a0).fault
2014-06-03 10:03:26.027314 7f6437073700 10 mon.u124-12@1(electing) e2
ms_get_authorizer for mon
2014-06-03 10:03:26.027558 7f6434910700 10 mon.u124-12@1(electing) e2
join_election
2014-06-03 10:03:26.027570 7f6434910700 10 mon.u124-12@1(electing) e2 _reset
2014-06-03 10:03:26.027575 7f6434910700 10 mon.u124-12@1(electing) e2
cancel_probe_timeout (none scheduled)
2014-06-03 10:03:26.027579 7f6434910700 10 mon.u124-12@1(electing) e2
timecheck_finish
2014-06-03 10:03:26.027598 7f6434910700 10 mon.u124-12@1(electing) e2
scrub_reset
2014-06-03 10:03:26.027615 7f6434910700 10 mon.u124-12@1(electing) e2
start_election
2014-06-03 10:03:26.027619 7f6434910700 10 mon.u124-12@1(electing) e2 _reset
2014-06-03 10:03:26.027623 7f6434910700 10 mon.u124-12@1(electing) e2
cancel_probe_timeout (none scheduled)
2014-06-03 10:03:26.027625 7f6434910700 10 mon.u124-12@1(electing) e2
timecheck_finish
2014-06-03 10:03:26.027628 7f6434910700 10 mon.u124-12@1(electing) e2
scrub_reset
2014-06-03 10:03:26.027631 7f6434910700 10 mon.u124-12@1(electing) e2
cancel_probe_timeout (none scheduled)
2014-06-03 10:03:26.027646 7f6434910700 0 log [INF] : mon.u124-12
calling new monitor election
2014-06-03 10:03:26.027728 7f6434910700 5
mon.u124-12@1(electing).elector(1499) start -- can i be leader?
2014-06-03 10:03:26.027820 7f6434910700 1
mon.u124-12@1(electing).elector(1499) init, last seen epoch 1499
2014-06-03 10:03:26.027988 7f6434910700 20 mon.u124-12@1(electing) e2
have connection
2014-06-03 10:03:26.027993 7f6434910700 20 mon.u124-12@1(electing) e2
ms_dispatch existing session MonSession: mon.2 10.37.124.13:6789/0 is
openallow * for mon.2 10.37.124.13:6789/0
2014-06-03 10:03:26.028008 7f6434910700 20 mon.u124-12@1(electing) e2
caps allow *
2014-06-03 10:03:26.028012 7f6434910700 20 is_capable service=mon
command= exec on cap allow *
2014-06-03 10:03:26.028017 7f6434910700 20 allow so far , doing grant
allow *
2014-06-03 10:03:26.028021 7f6434910700 20 allow all
2014-06-03 10:03:26.028056 7f6434910700 5
mon.u124-12@1(electing).elector(1499) handle_ack from mon.2
2014-06-03 10:03:26.028063 7f6434910700 5
mon.u124-12@1(electing).elector(1499) so far i have
{1=8796093022207,2=68719476735}
2014-06-03 10:03:26.028093 7f6434910700 20 mon.u124-12@1(electing) e2
have connection
2014-06-03 10:03:26.028097 7f6434910700 20 mon.u124-12@1(electing) e2
ms_dispatch existing session MonSession: mon.1 10.37.124.12:6789/0 is
openallow * for mon.1 10.37.124.12:6789/0
2014-06-03 10:03:26.028108 7f6434910700 20 mon.u124-12@1(electing) e2
caps allow *
2014-06-03 10:03:26.028114 7f6434910700 1
mon.u124-12@1(electing).paxos(paxos recovering c 27937488..27938103)
is_readable now=2014-06-03 10:03:26.028115 lease_expire=0.000000 has v0
lc 27938103
2014-06-03 10:03:26.028130 7f6434910700 1
mon.u124-12@1(electing).paxos(paxos recovering c 27937488..27938103)
is_readable now=2014-06-03 10:03:26.028132 lease_expire=0.000000 has v0
lc 27938103
2014-06-03 10:03:26.028580 7f6436e71700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39339 s=2 pgs=2394288 cs=54117
l=0 c=0x340b4a0).fault, initiating reconnect
2014-06-03 10:03:26.028677 7f6437073700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39339 s=1 pgs=2394288 cs=54118
l=0 c=0x340b4a0).fault
2014-06-03 10:03:26.029003 7f6434910700 20 mon.u124-12@1(electing) e2
have connection
2014-06-03 10:03:26.029011 7f6434910700 20 mon.u124-12@1(electing) e2
ms_dispatch existing session MonSession: mon.2 10.37.124.13:6789/0 is
openallow * for mon.2 10.37.124.13:6789/0
2014-06-03 10:03:26.029028 7f6434910700 20 mon.u124-12@1(electing) e2
caps allow *
2014-06-03 10:03:26.029034 7f6434910700 20 is_capable service=mon
command= exec on cap allow *
2014-06-03 10:03:26.029041 7f6434910700 20 allow so far , doing grant
allow *
2014-06-03 10:03:26.029044 7f6434910700 20 allow all
2014-06-03 10:03:26.029069 7f6434910700 5
mon.u124-12@1(electing).elector(1499) handle_ack from mon.2
2014-06-03 10:03:26.029076 7f6434910700 5
mon.u124-12@1(electing).elector(1499) so far i have
{1=8796093022207,2=68719476735}
2014-06-03 10:03:26.029136 7f6437073700 10 mon.u124-12@1(electing) e2
ms_get_authorizer for mon
2014-06-03 10:03:26.030439 7f6436e71700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39340 s=2 pgs=2394289 cs=54119
l=0 c=0x340b4a0).fault, initiating reconnect
2014-06-03 10:03:26.030523 7f6437073700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39340 s=1 pgs=2394289 cs=54120
l=0 c=0x340b4a0).fault
2014-06-03 10:03:26.030933 7f6437073700 10 mon.u124-12@1(electing) e2
ms_get_authorizer for mon
2014-06-03 10:03:26.032189 7f6436e71700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39341 s=2 pgs=2394290 cs=54121
l=0 c=0x340b4a0).fault, initiating reconnect
2014-06-03 10:03:26.032269 7f6437073700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422a80 sd=11 :39341 s=1 pgs=2394290 cs=54122
l=0 c=0x340b4a0).fault
2014-06-03 10:03:26.032732 7f6437073700 10 mon.u124-12@1(electing) e2
ms_get_authorizer for mon
2014-06-03 10:03:26.033306 7f6432f0b700 10 mon.u124-12@1(electing) e2
ms_verify_authorizer 10.37.124.11:6789/0 mon protocol 2
2014-06-03 10:03:26.033536 7f6432f0b700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422800 sd=13 :6789 s=0 pgs=0 cs=0 l=0
c=0x340d700).accept connect_seq 54122 vs existing 54122 state wait
2014-06-03 10:03:26.033609 7f6434910700 10 mon.u124-12@1(electing) e2
ms_handle_reset 0x340d700 10.37.124.11:6789/0
2014-06-03 10:03:26.034352 7f6434910700 20 mon.u124-12@1(electing) e2
have connection
2014-06-03 10:03:26.034356 7f6434910700 20 mon.u124-12@1(electing) e2
ms_dispatch existing session MonSession: mon.0 10.37.124.11:6789/0 is
openallow * for mon.0 10.37.124.11:6789/0
2014-06-03 10:03:26.034368 7f6434910700 20 mon.u124-12@1(electing) e2
caps allow *
2014-06-03 10:03:26.034372 7f6434910700 20 is_capable service=mon
command= exec on cap allow *
2014-06-03 10:03:26.034375 7f6434910700 20 allow so far , doing grant
allow *
2014-06-03 10:03:26.034378 7f6434910700 20 allow all
2014-06-03 10:03:26.034393 7f6434910700 5
mon.u124-12@1(electing).elector(1499) handle_propose from mon.0
2014-06-03 10:03:26.034398 7f6434910700 5
mon.u124-12@1(electing).elector(1499) defer to 0
2014-06-03 10:03:26.034599 7f6432f0b700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422800 sd=13 :6789 s=2 pgs=2394291 cs=54123
l=0 c=0x340b4a0).fault, initiating reconnect
2014-06-03 10:03:26.034703 7f6436e71700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422800 sd=13 :6789 s=1 pgs=2394291 cs=54124
l=0 c=0x340b4a0).fault
2014-06-03 10:03:26.035171 7f6436e71700 10 mon.u124-12@1(electing) e2
ms_get_authorizer for mon
2014-06-03 10:03:26.035351 7f6437073700 10 mon.u124-12@1(electing) e2
ms_verify_authorizer 10.37.124.11:6789/0 mon protocol 2
2014-06-03 10:03:26.035566 7f6437073700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :6789 s=0 pgs=0 cs=0 l=0
c=0x340d860).accept connect_seq 54124 vs existing 54124 state connecting
2014-06-03 10:03:26.035660 7f6434910700 10 mon.u124-12@1(electing) e2
ms_handle_reset 0x340d860 10.37.124.11:6789/0
2014-06-03 10:03:26.036539 7f6437073700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :6789 s=2 pgs=2394292 cs=54125
l=0 c=0x340b4a0).fault, initiating reconnect
2014-06-03 10:03:26.036627 7f6432f0b700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :6789 s=1 pgs=2394292 cs=54126
l=0 c=0x340b4a0).fault
2014-06-03 10:03:26.037063 7f6432f0b700 10 mon.u124-12@1(electing) e2
ms_get_authorizer for mon
2014-06-03 10:03:26.038382 7f6437073700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :39344 s=2 pgs=2394293 cs=54127
l=0 c=0x340b4a0).fault, initiating reconnect
2014-06-03 10:03:26.038832 7f6432f0b700 10 mon.u124-12@1(electing) e2
ms_get_authorizer for mon
2014-06-03 10:03:26.040188 7f6437073700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :39345 s=2 pgs=2394294 cs=54129
l=0 c=0x340b4a0).fault, initiating reconnect
2014-06-03 10:03:26.040275 7f6432f0b700 0 -- 10.37.124.12:6789/0 >>
10.37.124.11:6789/0 pipe(0x3422f80 sd=11 :39345 s=1 pgs=2394294 cs=54130
l=0 c=0x340b4a0).fault
2014-06-03 10:03:26.040665 7f6432f0b700 10 mon.u124-12@1(electing) e2
ms_get_authorizer for mon
--
______________________________________________________________________
Steffen Thorhauer
Department of Technical and Business Information Systems (ITI)
Faculty of Computer Science (FIN)
Otto von Guericke University Magdeburg
Universitaetsplatz 2
39106 Magdeburg, Germany
phone: 0391 67 52996
fax: 0391 67 12341
email: [email protected]
url: http://wwwiti.cs.uni-magdeburg.de/~thorhaue
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com