Udo, I think you might have better luck using "ceph osd set noout" before doing maintenance, rather than "ceph osd set nodown", since you want the node to be marked down to avoid having I/O directed at it (but not out to avoid having recovery backfill begin.)
-Aaron On Tue, Jan 21, 2014 at 10:01 AM, Udo Lembke <[email protected]> wrote: > Hi, > I need a little bit help. > We have an 4-node ceph cluster and the clients run in trouble if one > node is down (due to maintenance). > > After the node is switched on again ceph health shows (for a little time): > HEALTH_WARN 4 pgs incomplete; 14 pgs peering; 370 pgs stale; 12 pgs > stuck unclean; 36 requests are blocked > 32 sec; nodown flag(s) set > > nodown is set due to maintenance and in the global section of ceph.conf > is following defined to protect for such things: > osd pool default min size = 1 # Allow writing one copy in a degraded state. > > > And in the logfile I see messages like: > 2014-01-21 18:00:18.566712 osd.46 172.20.2.14:6821/12805 17 : [WRN] 6 > slow requests, 3 included below; oldest blocked for > 180.734141 secs > 2014-01-21 18:00:18.566717 osd.46 172.20.2.14:6821/12805 18 : [WRN] slow > request 120.523231 seconds old, received at 2014-01-21 > > Due to the message: > 2014-01-21 18:00:21.126693 mon.0 172.20.2.11:6789/0 410241 : [INF] pgmap > v8331119: 4808 pgs: 4805 active+clean, 1 active+clean+scrubbing, 2 > active+clean+scrubbing+deep; 57849 GB data, 113 TB used, 77841 GB / 189 > TB avail; 2304 B/s wr, 0 op/s > I assume it's has someting to do with scrubbing and not writing from the > VMs? > > Are there any switches which protect for this behavior? > > > regards > > Udo > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Aaron Ten Clay http://www.aarontc.com/
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
