Hello Patrick,

On 27.11.23 19:05, Patrick Donnelly wrote:

I would **really** love to see the debug logs from the MDS. Please
upload them using ceph-post-file [1]. If you can reliably reproduce,
turn on more debugging:

ceph config set mds debug_mds 20
ceph config set mds debug_ms 1

[1] https://docs.ceph.com/en/reef/man/8/ceph-post-file/


Uploaded debug log and core dump, see ceph-post-file: 02f78445-7136-44c9-a362-410de37a0b7d Unfortunately, we cannot easily shut down normal access to the cluster for these tests, therefore there is quite some clutter in the logs. The logs show three crashes, the last one with enabled core dumping (ulimits set to unlimited)

A note on reproducibility: To recreate the crash, reading the contents of the file prior to removal seems necessary. Simply calling stat on the file and then performing the removal also yields an Input/output error but does not crash the MDS.

Interestingly, the MDS_DAMAGE flag is reset on restart of the MDS and only comes back once the files in question are accessed (stat call is sufficient).


For now, I'll hold off on running first-damage.py to try to remove the affected files / inodes. Ultimately however, this seems to be the most sensible solution to me, at least with regards to cluster downtime.

Cheers
Sebastian
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to