The Hammer ticket was https://tracker.ceph.com/issues/13990. The problem here was when OSDs asked each other for which map they needed to keep and a leak would set it to NULL then that OSD would never delete an OSD map again until it was restarted.
On Thu, Aug 30, 2018 at 3:09 AM Joao Eduardo Luis <j...@suse.de> wrote: > On 08/30/2018 09:28 AM, Dan van der Ster wrote: > > Hi, > > > > Is anyone else seeing rocksdb mon stores slowly growing to >15GB, > > eventually triggering the 'mon is using a lot of disk space' warning? > > > > Since upgrading to luminous, we've seen this happen at least twice. > > Each time, we restart all the mons and then stores slowly trim down to > > <500MB. We have 'mon compact on start = true', but it's not the > > compaction that's shrinking the rockdb's -- the space used seems to > > decrease over a few minutes only after *all* mons have been restarted. > > > > This reminds me of a hammer-era issue where references to trimmed maps > > were leaking -- I can't find that bug at the moment, though. > > Next time this happens, mind listing the store contents and check if you > are holding way too many osdmaps? You shouldn't be holding more osdmaps > than the default IF the cluster is healthy and all the pgs are clean. > > I've chased a bug pertaining this last year, even got a patch, but then > was unable to reproduce it. Didn't pursue merging the patch any longer > (I think I may still have an open PR for it though), simply because it > was no longer clear if it was needed. > > -Joao > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com