Guten Tag.
Our file system is out of operation (mimic 13.2.10). Our MDSes are choking on
an operation:
2021-09-19 02:23:36.432664 mon.ceph-01 mon.0 192.168.32.65:6789/0 185676 :
cluster [WRN] Health check failed: 1 MDSs repor
t slow requests (MDS_SLOW_REQUEST)
[...]
2021-09-19 02:23:34.909269 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1662
: cluster [WRN] 33 slow requests, 5 included below; oldest blocked for >
32.729621 secs
2021-09-19 02:23:34.909277 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1663
: cluster [WRN] slow request 31.104289 seconds old, received at 2021-09-19
02:23:03.804307: client_request(client.44559846:1121833 lookup
#0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500,
caller_gid=260500{}) currently failed to authpin, subtree is being exported
2021-09-19 02:23:34.909280 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1664
: cluster [WRN] slow request 31.104254 seconds old, received at 2021-09-19
02:23:03.804343: client_request(client.44559846:1121834 lookup
#0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500,
caller_gid=260500{}) currently failed to authpin, subtree is being exported
2021-09-19 02:23:34.909283 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1665
: cluster [WRN] slow request 31.104231 seconds old, received at 2021-09-19
02:23:03.804365: client_request(client.44559846:1121835 lookup
#0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500,
caller_gid=260500{}) currently failed to authpin, subtree is being exported
2021-09-19 02:23:34.909285 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1666
: cluster [WRN] slow request 31.104213 seconds old, received at 2021-09-19
02:23:03.804384: client_request(client.44559846:1121836 lookup
#0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500,
caller_gid=260500{}) currently failed to authpin, subtree is being exported
2021-09-19 02:23:34.909288 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1667
: cluster [WRN] slow request 31.104142 seconds old, received at 2021-09-19
02:23:03.804455: client_request(client.44559846:1121837 lookup
#0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500,
caller_gid=260500{}) currently failed to authpin, subtree is being exported
By now, several thousand authpin operations are stuck for hours already. The
file system is basically inoperational and work is piling up:
# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 2 MDSs behind on trimming; 20 large
omap objects
MDS_SLOW_REQUEST 1 MDSs report slow requests
mdsceph-15(mds.3): 1554 slow requests are blocked > 30 secs
MDS_TRIM 2 MDSs behind on trimming
mdsceph-23(mds.0): Behind on trimming (7651/128) max_segments: 128,
num_segments: 7651
mdsceph-15(mds.3): Behind on trimming (4888/128) max_segments: 128,
num_segments: 4888
I would be grateful for advice on how to get out of this. Current fs status is:
# ceph fs status
con-fs2 - 1636 clients
=======
+------+--------+---------+---------------+-------+-------+
| Rank | State | MDS | Activity | dns | inos |
+------+--------+---------+---------------+-------+-------+
| 0 | active | ceph-23 | Reqs: 2 /s | 2024k | 2019k |
| 1 | active | ceph-12 | Reqs: 0 /s | 1382k | 1374k |
| 2 | active | ceph-08 | Reqs: 0 /s | 998k | 926k |
| 3 | active | ceph-15 | Reqs: 0 /s | 1373k | 1272k |
+------+--------+---------+---------------+-------+-------+
+---------------------+----------+-------+-------+
| Pool | type | used | avail |
+---------------------+----------+-------+-------+
| con-fs2-meta1 | metadata | 102G | 1252G |
| con-fs2-meta2 | data | 0 | 1252G |
| con-fs2-data | data | 1359T | 6003T |
| con-fs2-data-ec-ssd | data | 239G | 4006G |
| con-fs2-data2 | data | 56.0T | 5457T |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
| ceph-16 |
| ceph-14 |
| ceph-13 |
| ceph-17 |
| ceph-10 |
| ceph-24 |
| ceph-09 |
| ceph-11 |
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641)
mimic (stable)
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]