Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

Thomas Byrne - UKRI STFC Mon, 21 May 2018 02:26:30 -0700

mon_compact_on_start was not changed from default (false). From the logs, it 
looks like the monitor with the excessive resource usage (mon1) was up and 
winning the majority of elections throughout the period of unresponsiveness, 
with other monitors occasionally winning an election without mon1 participating 
(I’m guessing as it failed to respond).

That’s interesting about the false map updates. We had a short networking blip 
(caused by me) on some monitors shortly before the trouble started, which 
caused some monitors to start calling frequent (every few seconds) elections. 
Could this rapid creation of new monmaps have the same effect as updating pool 
settings? Thus causing the monitor to try and clean up in one go, causing the 
observed resource usage and unresponsiveness.

I’ve been bringing in the storage as you described, I’m in the process of 
adding 6PB of new storage to a ~10PB (raw) cluster (with ~8PB raw utilisation), 
so I’m feeling around for the largest backfills we can safely do. I had been 
weighting up storage in steps that take ~5 days to finish, but have been 
starting the next reweight as we get to the tail end of the previous, so not 
giving the mons time to compact their stores. Although it’s far from ideal 
(from a total time to get new storage weighted up), I’ll be letting the mons 
compact between every backfill until I have a better idea of what went on last 
week.

From: David Turner <[email protected]>
Sent: 17 May 2018 18:57
To: Byrne, Thomas (STFC,RAL,SC) <[email protected]>
Cc: Wido den Hollander <[email protected]>; [email protected]
Subject: Re: [ceph-users] A question about HEALTH_WARN and monitors holding 
onto cluster maps

Generally they clean up slowly by deleting 30 maps every time the maps update.  
You can speed that up by creating false map updates with something like 
updating a pool setting to what it already is.  What it sounds like happened to 
you is that your mon crashed and restarted.  If it crashed and has the setting 
to compact the mon store on start, then it would cause it to forcibly go 
through and clean everything up in 1 go.

I generally plan my backfilling to not take longer than a week.  Any longer 
than that is pretty rough on the mons.  You can achieve that by bringing in new 
storage with a weight of 0.0 and increase it appropriately as opposed to just 
adding it with it's full weight and having everything move at once.

On Thu, May 17, 2018 at 12:56 PM Thomas Byrne - UKRI STFC 
<[email protected]<mailto:[email protected]>> wrote:
That seems like a sane way to do it, thanks for the clarification Wido.

As a follow-up, do you have any feeling as to whether the trimming a 
particularly intensive task? We just had a fun afternoon where the monitors 
became unresponsive (no ceph status etc) for several hours, seemingly due to 
the leaders monitor process consuming all available ram+swap (64GB+32GB) on 
that monitor. This was then followed by the actual trimming of the stores 
(26GB->11GB), which took a few minutes and happened simultaneously across the 
monitors.

If this is something to be expected, it'll be a good reason to plan our long 
backfills much more carefully in the future!

> -----Original Message-----
> From: ceph-users 
> <[email protected]<mailto:[email protected]>> 
> On Behalf Of Wido
> den Hollander
> Sent: 17 May 2018 15:40
> To: [email protected]<mailto:[email protected]>
> Subject: Re: [ceph-users] A question about HEALTH_WARN and monitors
> holding onto cluster maps
>
>
>
> On 05/17/2018 04:37 PM, Thomas Byrne - UKRI STFC wrote:
> > Hi all,
> >
> >
> >
> > As far as I understand, the monitor stores will grow while not
> > HEALTH_OK as they hold onto all cluster maps. Is this true for all
> > HEALTH_WARN reasons? Our cluster recently went into HEALTH_WARN
> due to
> > a few weeks of backfilling onto new hardware pushing the monitors data
> > stores over the default 15GB threshold. Are they now prevented from
> > shrinking till I increase the threshold above their current size?
> >
>
> No, monitors will trim their data store with all PGs are active+clean, not 
> when
> they are HEALTH_OK.
>
> So a 'noout' flag triggers a WARN, but that doesn't prevent the MONs from
> trimming for example.
>
> Wido
>
> >
> >
> > Cheers
> >
> > Tom
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]<mailto:[email protected]>
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> _______________________________________________
> ceph-users mailing list
> [email protected]<mailto:[email protected]>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]<mailto:[email protected]>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] A question about HEALTH_WARN and monitors holding onto cluster maps

Reply via email to