Re: [ceph-users] All client writes block when 2 of 3 OSDs down

Steffen W Sørensen Thu, 26 Mar 2015 16:33:06 -0700

> On 26/03/2015, at 23.36, Somnath Roy <[email protected]> wrote:
> 
> Got most portion of it, thanks !
> But, still not able to get when second node is down why with single monitor 
> in the cluster client is not able to connect ? 
> 1 monitor can form a quorum and should be sufficient for a cluster to run.
To have quorum you need more than 50% of monitors, which isn’t possible with 
one out of two, since 1 < (0.5*2 + 1) hence at least 3 monitors.


> 
> Thanks & Regards
> Somnath
> 
> -----Original Message-----
> From: Gregory Farnum [mailto:[email protected]] 
> Sent: Thursday, March 26, 2015 3:29 PM
> To: Somnath Roy
> Cc: Lee Revell; [email protected]
> Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down
> 
> On Thu, Mar 26, 2015 at 3:22 PM, Somnath Roy <[email protected]> wrote:
>> Greg,
>> Couple of dumb question may be.
>> 
>> 1. If you see , the clients are connecting fine with two monitors in the 
>> cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 
>> monitor  (which is I guess happening after making 2 nodes down) it is not 
>> able to connect ?
> 
> A quorum is a strict majority of the total membership. 2 monitors can form a 
> quorum just fine if there are either 2 or 3 total membership.
> (As long as those two agree on every action, it cannot be lost.)
> 
> We don't *recommend* configuring systems with an even number of monitors, 
> because it increases the number of total possible failures without increasing 
> the number of failures that can be tolerated. (3 monitors requires 2 in 
> quorum, 4 does too. Same for 5 and 6, 7 and 8, etc etc.)
> 
>> 
>> 2. Also, my understanding is while IO is going on *no* monitor interaction 
>> will be on that path, so, why the client io will be stopped because the 
>> monitor quorum is not there ? If the min_size =1 is properly set it should 
>> able to serve IO as long as 1 OSD (node) is up, isn't it ?
> 
> Well, the remaining OSD won't be able to process IO because it's lost its 
> peers, and it can't reach any monitors to do updates or get new maps. 
> (Monitors which are not in quorum will not allow clients to
> connect.)
> The clients will eventually stop serving IO if they know they can't reach a 
> monitor, although I don't remember exactly how that's triggered.
> 
> In this particular case, though, the client probably just tried to do an op 
> against the dead osd, realized it couldn't, and tried to fetch a map from the 
> monitors. When that failed it went into search mode, which is what the logs 
> are showing you.
> -Greg
> 
>> 
>> Thanks & Regards
>> Somnath
>> 
>> -----Original Message-----
>> From: ceph-users [mailto:[email protected]] On Behalf 
>> Of Gregory Farnum
>> Sent: Thursday, March 26, 2015 2:40 PM
>> To: Lee Revell
>> Cc: [email protected]
>> Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs 
>> down
>> 
>> On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell <[email protected]> wrote:
>>> On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum <[email protected]> wrote:
>>>> 
>>>> Has the OSD actually been detected as down yet?
>>>> 
>>> 
>>> I believe it has, however I can't directly check because "ceph health"
>>> starts to hang when I down the second node.
>> 
>> Oh. You need to keep a quorum of your monitors running (just the monitor 
>> processes, not of everything in the system) or nothing at all is going to 
>> work. That's how we prevent split brain issues.
>> 
>>> 
>>>> 
>>>> You'll also need to set that min size on your existing pools ("ceph 
>>>> osd pool <pool> set min_size 1" or similar) to change their 
>>>> behavior; the config option only takes effect for newly-created 
>>>> pools. (Thus the
>>>> "default".)
>>> 
>>> 
>>> I've done this, however the behavior is the same:
>>> 
>>> $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do 
>>> ceph osd pool set $f min_size 1; done set pool 0 min_size to 1 set 
>>> pool 1 min_size to 1 set pool 2 min_size to 1 set pool 3 min_size to 
>>> 1 set pool 4 min_size to 1 set pool 5 min_size to 1 set pool 6 
>>> min_size to 1 set pool 7 min_size to 1
>>> 
>>> $ ceph -w
>>>    cluster db460aa2-5129-4aaa-8b2e-43eac727124e
>>>     health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2
>>>     monmap e3: 3 mons at
>>> {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/
>>> 0 ,ceph-node-3=192.168.122.141:6789/0},
>>> election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2
>>>     mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active}
>>>     osdmap e362: 3 osds: 2 up, 2 in
>>>      pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects
>>>            25329 MB used, 12649 MB / 40059 MB avail
>>>                 840 active+clean
>>> 
>>> 2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840
>>> active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail
>>> 2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840
>>> active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail;
>>> active+0 B/s
>>> rd, 260 kB/s wr, 13 op/s
>>> 2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840
>>> active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail;
>>> active+0 B/s
>>> rd, 943 kB/s wr, 38 op/s
>>> 2015-03-26 17:26:01.058167 mon.0 [INF] pgmap v5916: 840 pgs: 840
>>> active+clean; 7441 MB data, 25335 MB used, 12643 MB / 40059 MB avail;
>>> active+0 B/s
>>> rd, 10699 kB/s wr, 621 op/s
>>> 
>>> <this is where i kill the second OSD>
>>> 
>>> 2015-03-26 17:26:26.778461 7f4ebeffd700  0 monclient: hunting for new 
>>> mon
>>> 2015-03-26 17:26:30.701099 7f4ec45f5700  0 --
>>> 192.168.122.111:0/1007741 >>
>>> 192.168.122.141:6789/0 pipe(0x7f4ec0023200 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7f4ec0023490).fault
>>> 2015-03-26 17:26:42.701154 7f4ec44f4700  0 --
>>> 192.168.122.111:0/1007741 >>
>>> 192.168.122.131:6789/0 pipe(0x7f4ec00251b0 sd=3 :0 s=1 pgs=0 cs=0 l=1 
>>> c=0x7f4ec0025440).fault
>>> 
>>> And all writes block until I bring back an OSD.
>>> 
>>> Lee
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> ________________________________
>> 
>> PLEASE NOTE: The information contained in this electronic mail message is 
>> intended only for the use of the designated recipient(s) named above. If the 
>> reader of this message is not the intended recipient, you are hereby 
>> notified that you have received this message in error and that any review, 
>> dissemination, distribution, or copying of this message is strictly 
>> prohibited. If you have received this communication in error, please notify 
>> the sender by telephone or e-mail (as shown above) immediately and destroy 
>> any and all copies of this message in your possession (whether hard copies 
>> or electronically stored copies).
>> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] All client writes block when 2 of 3 OSDs down

Reply via email to