On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum <[email protected]> wrote: > On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote: >> On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum <[email protected] >> (mailto:[email protected])> wrote: >> > "ceph pg set_full_ratio 0.95" >> > "ceph pg set_nearfull_ratio 0.94" >> > >> > >> > On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote: >> > >> > > On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum <[email protected] >> > > (mailto:[email protected])> wrote: >> > > > On Saturday, July 14, 2012 at 7:20 AM, Andrey Korolyov wrote: >> > > > > On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil <[email protected] >> > > > > (mailto:[email protected])> wrote: >> > > > > > On Fri, 13 Jul 2012, Gregory Farnum wrote: >> > > > > > > On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov <[email protected] >> > > > > > > (mailto:[email protected])> wrote: >> > > > > > > > Hi, >> > > > > > > > >> > > > > > > > Recently I`ve reduced my test suite from 6 to 4 osds at ~60% >> > > > > > > > usage on >> > > > > > > > six-node, >> > > > > > > > and I have removed a bunch of rbd objects during recovery to >> > > > > > > > avoid >> > > > > > > > overfill. >> > > > > > > > Right now I`m constantly receiving a warn about nearfull state >> > > > > > > > on >> > > > > > > > non-existing osd: >> > > > > > > > >> > > > > > > > health HEALTH_WARN 1 near full osd(s) >> > > > > > > > monmap e3: 3 mons at >> > > > > > > > {0=192.168.10.129:6789/0,1=192.168.10.128:6789/0,2=192.168.10.127:6789/0}, >> > > > > > > > election epoch 240, quorum 0,1,2 0,1,2 >> > > > > > > > osdmap e2098: 4 osds: 4 up, 4 in >> > > > > > > > pgmap v518696: 464 pgs: 464 active+clean; 61070 MB data, 181 GB >> > > > > > > > used, 143 GB / 324 GB avail >> > > > > > > > mdsmap e181: 1/1/1 up {0=a=up:active} >> > > > > > > > >> > > > > > > > HEALTH_WARN 1 near full osd(s) >> > > > > > > > osd.4 is near full at 89% >> > > > > > > > >> > > > > > > > Needless to say, osd.4 remains only in ceph.conf, but not at >> > > > > > > > crushmap. >> > > > > > > > Reducing has been done 'on-line', e.g. without restart entire >> > > > > > > > cluster. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > Whoops! It looks like Sage has written some patches to fix this, >> > > > > > > but >> > > > > > > for now you should be good if you just update your ratios to a >> > > > > > > larger >> > > > > > > number, and then bring them back down again. :) >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > Restarting ceph-mon should also do the trick. >> > > > > > >> > > > > > Thanks for the bug report! >> > > > > > sage >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > Should I restart mons simultaneously? >> > > > I don't think restarting will actually do the trick for you — you >> > > > actually will need to set the ratios again. >> > > > >> > > > > Restarting one by one has no >> > > > > effect, same as filling up data pool up to ~95 percent(btw, when I >> > > > > deleted this 50Gb file on cephfs, mds was stuck permanently and usage >> > > > > remained same until I dropped and recreated data pool - hope it`s one >> > > > > of known posix layer bugs). I also deleted entry from config, and >> > > > > then >> > > > > restarted mons, with no effect. Any suggestions? >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > I'm not sure what you're asking about here? >> > > > -Greg >> > > >> > > >> > > >> > > >> > > >> > > Oh, sorry, I have mislooked and thought that you suggested filling up >> > > osds. How do I can set full/nearfull ratios correctly? >> > > >> > > $ceph injectargs '--mon_osd_full_ratio 96' >> > > parsed options >> > > $ ceph injectargs '--mon_osd_near_full_ratio 94' >> > > parsed options >> > > >> > > ceph pg dump | grep 'full' >> > > full_ratio 0.95 >> > > nearfull_ratio 0.85 >> > > >> > > Setting parameters in the ceph.conf and then restarting mons does not >> > > affect ratios either. >> > >> >> >> >> Thanks, it worked, but setting values back result to turn warning back. > Hrm. That shouldn't be possible if the OSD has been removed. How did you take > it out? It sounds like maybe you just marked it in the OUT state (and turned > it off quite quickly) without actually taking it out of the cluster? > -Greg >
As I have did removal, it was definitely not like that - at first place, I have marked osds(4 and 5 on same host) out, then rebuilt crushmap and then kill osd processes. As I mentioned before, osd.4 doest not exist in crushmap and therefore it shouldn`t be reported at all(theoretically). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
