Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

David Turner Thu, 17 May 2018 10:57:20 -0700

Generally they clean up slowly by deleting 30 maps every time the maps
update.  You can speed that up by creating false map updates with something
like updating a pool setting to what it already is.  What it sounds like
happened to you is that your mon crashed and restarted.  If it crashed and
has the setting to compact the mon store on start, then it would cause it
to forcibly go through and clean everything up in 1 go.


I generally plan my backfilling to not take longer than a week.  Any longer
than that is pretty rough on the mons.  You can achieve that by bringing in
new storage with a weight of 0.0 and increase it appropriately as opposed
to just adding it with it's full weight and having everything move at once.

On Thu, May 17, 2018 at 12:56 PM Thomas Byrne - UKRI STFC <
[email protected]> wrote:

> That seems like a sane way to do it, thanks for the clarification Wido.
>
> As a follow-up, do you have any feeling as to whether the trimming a
> particularly intensive task? We just had a fun afternoon where the monitors
> became unresponsive (no ceph status etc) for several hours, seemingly due
> to the leaders monitor process consuming all available ram+swap (64GB+32GB)
> on that monitor. This was then followed by the actual trimming of the
> stores (26GB->11GB), which took a few minutes and happened simultaneously
> across the monitors.
>
> If this is something to be expected, it'll be a good reason to plan our
> long backfills much more carefully in the future!
>
> > -----Original Message-----
> > From: ceph-users <[email protected]> On Behalf Of Wido
> > den Hollander
> > Sent: 17 May 2018 15:40
> > To: [email protected]
> > Subject: Re: [ceph-users] A question about HEALTH_WARN and monitors
> > holding onto cluster maps
> >
> >
> >
> > On 05/17/2018 04:37 PM, Thomas Byrne - UKRI STFC wrote:
> > > Hi all,
> > >
> > >
> > >
> > > As far as I understand, the monitor stores will grow while not
> > > HEALTH_OK as they hold onto all cluster maps. Is this true for all
> > > HEALTH_WARN reasons? Our cluster recently went into HEALTH_WARN
> > due to
> > > a few weeks of backfilling onto new hardware pushing the monitors data
> > > stores over the default 15GB threshold. Are they now prevented from
> > > shrinking till I increase the threshold above their current size?
> > >
> >
> > No, monitors will trim their data store with all PGs are active+clean,
> not when
> > they are HEALTH_OK.
> >
> > So a 'noout' flag triggers a WARN, but that doesn't prevent the MONs from
> > trimming for example.
> >
> > Wido
> >
> > >
> > >
> > > Cheers
> > >
> > > Tom
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > [email protected]
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

Reply via email to