On 01/05/2023 11:35, Frank Schilder wrote:
Hi all,
I think we might be hitting a known problem
(https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet,
because we have troubles with older kclients that miss the mds restart and hold
on to cache entries referring to the killed instance, leading to hanging jobs
on our HPC cluster.
I have seen this issue before and there was a process in D-state that
dead-locked itself. Usually, killing this process succeeded and resolved the
issue. However, this time I can't find such a process.
The tracker mentions that one can delete the file/folder. I have the inode
number, but really don't want to start a find on a 1.5PB file system. Is there
a better way to find what path is causing the issue (ask the MDS directly, look
at a cache dump, or similar)? Is there an alternative to deletion or MDS fail?
Hello,
If you have the inode number, you can retrieve the name with something like:
rados getxattr -p $POOL ${ino}.00000000 parent | \
ceph-dencoder type inode_backtrace_t import - decode dump_json | \
jq -M '[.ancestors[].dname]' | tr -d '[[",\]]' | \
awk 't!=""{t=$1 "/" t;}t==""{t=$1;}END{print t}'
Where $POOL is the "default pool" name (for files) or the metadata pool
name (for directories) and $ino is the inode number (in hexadecimal).
Loïc.
--
| Loīc Tortay <[email protected]> - IN2P3 Computing Centre |
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]