The only active mds server has a ram usage of about 50G. The memory limit is 32G, but we get no warnings about that. Maybe the separate purge_queue is consuming a lot of RAM and it does not count for the limit? Usually we get notified when the mds is behind the memory limit.
thank you On 22.05.23 15:23, [email protected] wrote:
Hi Venky,
thank you for your help. We managed to shut down mds.1:
We set "ceph fs set max_mds 1" and waited for about 30 minutes. In the first
couple minutes, strays were migrated from mds.1 to mds.0. After this, the stray export
hung. The mds.1 remained in the state_stopping. After about 30 minutes, we restarted
mds.1. This resulted in one active mds and two standby mds. However, we are not sure, if
the remaining strays could be migrated.
When we had a closer look at the perf counter of the mds, we realized that the number of
strays_enqueued is quite high and constantly increasing. Is this to be expected? What
does the counter "strays_enqueued" mean in detail?
ceph daemon mds.0 perf dump | grep stray
"num_strays": 49846,
"num_strays_delayed": 21,
"num_strays_enqueuing": 0,
"strays_created": 2042124,
"strays_enqueued": 2396076,
"strays_reintegrated": 44207,
"strays_migrated": 38,
Would it be safe to perform "ceph orch upgrade resume" at this point? At the
moment, the MONs and OSDs are running 17.2.6, while the MDSs and RGWs are running 17.2.5.
So we have to upgrade the MDS and RGW eventually.
Best, Tobias
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
