[ceph-users] Best practice for osd_min_down_reporters

Wido den Hollander Tue, 07 May 2013 05:59:19 -0700

Hi,

I was just upgrading a 9 nodes, 36 OSD cluster running the next branchfrom some days ago to the Cuttlefish release.

While rebooting the nodes one by one and waiting for a active+clean forall PGs I noticed that some weird things happened.


I reboot a node and see:

"osdmap e580: 36 osds: 4 up, 36 in"

After a few seconds I see all the OSDs reporting:

osd.33 [WRN] map e582 wrongly marked me down
osd.5 [WRN] map e582 wrongly marked me down
osd.6 [WRN] map e582 wrongly marked me down

I didn't check what was happening here, but it seems like the 4 OSDs whowere shutting down reported everybody but themselves out (Should haveprinted ceph osd tree).


Thinking about that, there is the following configuration option:

OPTION(osd_min_down_reporters, OPT_INT, 1)
OPTION(osd_min_down_reports, OPT_INT, 3)

So if just one OSD sends 3 reports it can mark anybody in the clusterdown, right?

Shouldn't the best practice be to set osd_min_down_reporters to at leastnumosdperhost+1


In this case I have 4 OSDs per host, so shouldn't I use 5 here?

This might as well be a bug, but it still doesn't seem right that allthe OSDs on one machine can mark the whole cluster down.


--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Best practice for osd_min_down_reporters

Reply via email to