Greg,
Couple of dumb question may be.

1. If you see , the clients are connecting fine with two monitors in the 
cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 monitor 
 (which is I guess happening after making 2 nodes down) it is not able to 
connect ?

2. Also, my understanding is while IO is going on *no* monitor interaction will 
be on that path, so, why the client io will be stopped because the monitor 
quorum is not there ? If the min_size =1 is properly set it should able to 
serve IO as long as 1 OSD (node) is up, isn't it ?

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Gregory Farnum
Sent: Thursday, March 26, 2015 2:40 PM
To: Lee Revell
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down

On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell <rlrev...@gmail.com> wrote:
> On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum <g...@gregs42.com> wrote:
>>
>> Has the OSD actually been detected as down yet?
>>
>
> I believe it has, however I can't directly check because "ceph health"
> starts to hang when I down the second node.

Oh. You need to keep a quorum of your monitors running (just the monitor 
processes, not of everything in the system) or nothing at all is going to work. 
That's how we prevent split brain issues.

>
>>
>> You'll also need to set that min size on your existing pools ("ceph
>> osd pool <pool> set min_size 1" or similar) to change their behavior;
>> the config option only takes effect for newly-created pools. (Thus
>> the
>> "default".)
>
>
> I've done this, however the behavior is the same:
>
> $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do
> ceph osd pool set $f min_size 1; done set pool 0 min_size to 1 set
> pool 1 min_size to 1 set pool 2 min_size to 1 set pool 3 min_size to 1
> set pool 4 min_size to 1 set pool 5 min_size to 1 set pool 6 min_size
> to 1 set pool 7 min_size to 1
>
> $ ceph -w
>     cluster db460aa2-5129-4aaa-8b2e-43eac727124e
>      health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2
>      monmap e3: 3 mons at
> {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/0
> ,ceph-node-3=192.168.122.141:6789/0},
> election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2
>      mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active}
>      osdmap e362: 3 osds: 2 up, 2 in
>       pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects
>             25329 MB used, 12649 MB / 40059 MB avail
>                  840 active+clean
>
> 2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840
> active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail
> 2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840
> active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail;
> active+0 B/s
> rd, 260 kB/s wr, 13 op/s
> 2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840
> active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail;
> active+0 B/s
> rd, 943 kB/s wr, 38 op/s
> 2015-03-26 17:26:01.058167 mon.0 [INF] pgmap v5916: 840 pgs: 840
> active+clean; 7441 MB data, 25335 MB used, 12643 MB / 40059 MB avail;
> active+0 B/s
> rd, 10699 kB/s wr, 621 op/s
>
> <this is where i kill the second OSD>
>
> 2015-03-26 17:26:26.778461 7f4ebeffd700  0 monclient: hunting for new
> mon
> 2015-03-26 17:26:30.701099 7f4ec45f5700  0 --
> 192.168.122.111:0/1007741 >>
> 192.168.122.141:6789/0 pipe(0x7f4ec0023200 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f4ec0023490).fault
> 2015-03-26 17:26:42.701154 7f4ec44f4700  0 --
> 192.168.122.111:0/1007741 >>
> 192.168.122.131:6789/0 pipe(0x7f4ec00251b0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7f4ec0025440).fault
>
> And all writes block until I bring back an OSD.
>
> Lee
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to