That thread looks like the right one. So far I haven't needed to restart the osd's for the churn trick to work. I bet you're right that something thinks it still needs one of the old osdmaps on your cluster. Last night our cluster finished another round of expansions and we're seeing up to 49,272 osdmaps hanging around. The churn trick seems to be working again too.
Bryan From: Dan van der Ster <[email protected]> Date: Thursday, January 10, 2019 at 3:13 AM To: Bryan Stillwell <[email protected]> Cc: ceph-users <[email protected]> Subject: Re: [ceph-users] osdmaps not being cleaned up in 12.2.8 Hi Bryan, I think this is the old hammer thread you refer to: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/013060.html We also have osdmaps accumulating on v12.2.8 -- ~12000 per osd at the moment. I'm trying to churn the osdmaps like before, but our maps are not being trimmed. Did you need to restart the osd's before the churn trick would work? If so, it seems that something is holding references to old maps, like like that old hammer issue. Cheers, Dan On Tue, Jan 8, 2019 at 5:39 PM Bryan Stillwell <[email protected]<mailto:[email protected]>> wrote: I was able to get the osdmaps to slowly trim (maybe 50 would trim with each change) by making small changes to the CRUSH map like this: for i in {1..100}; do ceph osd crush reweight osd.1754 4.00001 sleep 5 ceph osd crush reweight osd.1754 4 sleep 5 done I believe this was the solution Dan came across back in the hammer days. It works, but not ideal for sure. Across the cluster it freed up around 50TB of data! Bryan From: ceph-users <[email protected]<mailto:[email protected]>> on behalf of Bryan Stillwell <[email protected]<mailto:[email protected]>> Date: Monday, January 7, 2019 at 2:40 PM To: ceph-users <[email protected]<mailto:[email protected]>> Subject: [ceph-users] osdmaps not being cleaned up in 12.2.8 I have a cluster with over 1900 OSDs running Luminous (12.2.8) that isn't cleaning up old osdmaps after doing an expansion. This is even after the cluster became 100% active+clean: # find /var/lib/ceph/osd/ceph-1754/current/meta -name 'osdmap*' | wc -l 46181 With the osdmaps being over 600KB in size this adds up: # du -sh /var/lib/ceph/osd/ceph-1754/current/meta 31G /var/lib/ceph/osd/ceph-1754/current/meta I remember running into this during the hammer days: http://tracker.ceph.com/issues/13990 Did something change recently that may have broken this fix? Thanks, Bryan
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
