Just to further piggyback, Probably the most "hard" the mgr seems to get pushed is when the balancer is engaged. When trying to eval a pool or cluster, it takes upwards of 30-120 seconds for it to score it, and then another 30-120 seconds to execute the plan, and it never seems to engage automatically.
> $ time ceph balancer status
> {
> "active": true,
> "plans": [],
> "mode": "upmap"
> }
>
> real 0m36.490s
> user 0m0.259s
> sys 0m0.044s
I'm going to disable mine as well, and see if I can stop waking up to 'No
Active MGR.'
You can see when I lose mgr's because RBD image stats go to 0 until I catch it.
Thanks,
Reed
> On Aug 27, 2019, at 11:24 AM, Jake Grimmett <[email protected]> wrote:
>
> Hi Reed, Lenz, John
>
> I've just tried disabling the balancer, so far ceph-mgr is keeping it's
> CPU mostly under 20%, even with both the iostat and dashboard back on.
>
> # ceph balancer off
>
> was
> [root@ceph-s1 backup]# ceph balancer status
> {
> "active": true,
> "plans": [],
> "mode": "upmap"
> }
>
> now
> [root@ceph-s1 backup]# ceph balancer status
> {
> "active": false,
> "plans": [],
> "mode": "upmap"
> }
>
> We are using 8:2 erasure encoding across 324 12TB OSD, plus 4 NVMe OSD
> for a replicated cephfs metadata pool.
>
> let me know if the balancer is your problem too...
>
> best,
>
> Jake
>
> On 8/27/19 3:57 PM, Jake Grimmett wrote:
>> Yes, the problem still occurs with the dashboard disabled...
>>
>> Possibly relevant, when both the dashboard and iostat plugins are
>> disabled, I occasionally see ceph-mgr rise to 100% CPU.
>>
>> as suggested by John Hearns, the output of gstack ceph-mgr when at 100%
>> is here:
>>
>> http://p.ip.fi/52sV
>>
>> many thanks
>>
>> Jake
>>
>> On 8/27/19 3:09 PM, Reed Dier wrote:
>>> I'm currently seeing this with the dashboard disabled.
>>>
>>> My instability decreases, but isn't wholly cured, by disabling
>>> prometheus and rbd_support, which I use in tandem, as the only thing I'm
>>> using the prom-exporter for is the per-rbd metrics.
>>>
>>>> ceph mgr module ls
>>>> {
>>>> "enabled_modules": [
>>>> "diskprediction_local",
>>>> "influx",
>>>> "iostat",
>>>> "prometheus",
>>>> "rbd_support",
>>>> "restful",
>>>> "telemetry"
>>>> ],
>>>
>>> I'm on Ubuntu 18.04, so that doesn't corroborate with some possible OS
>>> correlation.
>>>
>>> Thanks,
>>>
>>> Reed
>>>
>>>> On Aug 27, 2019, at 8:37 AM, Lenz Grimmer <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>> Hi Jake,
>>>>
>>>> On 8/27/19 3:22 PM, Jake Grimmett wrote:
>>>>
>>>>> That exactly matches what I'm seeing:
>>>>>
>>>>> when iostat is working OK, I see ~5% CPU use by ceph-mgr
>>>>> and when iostat freezes, ceph-mgr CPU increases to 100%
>>>>
>>>> Does this also occur if the dashboard module is disabled? Just wondering
>>>> if this is isolatable to the iostat module. Thanks!
>>>>
>>>> Lenz
>>>>
>>>> --
>>>> SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg
>>>> GF: Felix Imendörffer, HRB 247165 (AG Nürnberg)
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected] <mailto:[email protected]>
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>
>
> --
> MRC Laboratory of Molecular Biology
> Francis Crick Avenue,
> Cambridge CB2 0QH, UK.
>
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
