Hi Everyone,

We have a cluster that of which the manager is not working nicely. The mgrs are 
all very slow to respond. This initially caused them to continuously fail over.

We've disabled most of the modules. 

We’ve set the following which seemed to improve the situation a little bit but 
the problem came back.

ms_async_op_threads = 10
ms_async_max_op_threads = 16
mgr_stats_period = 10

However, the ms_dispatch thread is at 99.9% cpu all the time. If we fail the 
manager it will be 99.9% on the new mgr. We has restarted all mon and mgr 
daemons.

The perf dump shows an extreme amount of get_or_fail_fail entries.

"throttle-mgr_mon_messsages": {
        "val": 128,
        "max": 128,
        "get_started": 0,
        "get": 1191,
        "get_sum": 1191,
        "get_or_fail_fail": 188691955,
        "get_or_fail_success": 1191,
        "take": 0,
        "take_sum": 0,
        "put": 1191,
        "put_sum": 1191,
        "wait": {
            "avgcount": 0,
            "sum": 0.000000000,
            "avgtime": 0.000000000
        }

Thanks,
Wout
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to