[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

Sebastian Knust Wed, 29 Nov 2023 12:12:08 -0800

Hello Patrick,

On 27.11.23 19:05, Patrick Donnelly wrote:


I would **really** love to see the debug logs from the MDS. Please
upload them using ceph-post-file [1]. If you can reliably reproduce,
turn on more debugging:

ceph config set mds debug_mds 20
ceph config set mds debug_ms 1


[1] https://docs.ceph.com/en/reef/man/8/ceph-post-file/

Uploaded debug log and core dump, see ceph-post-file:02f78445-7136-44c9-a362-410de37a0b7dUnfortunately, we cannot easily shut down normal access to the clusterfor these tests, therefore there is quite some clutter in the logs. Thelogs show three crashes, the last one with enabled core dumping (ulimitsset to unlimited)

A note on reproducibility: To recreate the crash, reading the contentsof the file prior to removal seems necessary. Simply calling stat on thefile and then performing the removal also yields an Input/output errorbut does not crash the MDS.

Interestingly, the MDS_DAMAGE flag is reset on restart of the MDS andonly comes back once the files in question are accessed (stat call issufficient).

For now, I'll hold off on running first-damage.py to try to remove theaffected files / inodes. Ultimately however, this seems to be the mostsensible solution to me, at least with regards to cluster downtime.


Cheers
Sebastian
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: MDS_DAMAGE in 17.2.7 / Cannot delete affected files

Reply via email to