[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

Loic Tortay Mon, 01 May 2023 06:06:59 -0700

On 01/05/2023 11:35, Frank Schilder wrote:

Hi all,


I think we might be hitting a known problem 
(https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet, 
because we have troubles with older kclients that miss the mds restart and hold 
on to cache entries referring to the killed instance, leading to hanging jobs 
on our HPC cluster.

I have seen this issue before and there was a process in D-state that 
dead-locked itself. Usually, killing this process succeeded and resolved the 
issue. However, this time I can't find such a process.

The tracker mentions that one can delete the file/folder. I have the inode 
number, but really don't want to start a find on a 1.5PB file system. Is there 
a better way to find what path is causing the issue (ask the MDS directly, look 
at a cache dump, or similar)? Is there an alternative to deletion or MDS fail?

Hello,
If you have the inode number, you can retrieve the name with something like:
 rados getxattr -p $POOL ${ino}.00000000 parent | \
  ceph-dencoder type inode_backtrace_t import - decode dump_json | \
  jq -M '[.ancestors[].dname]' | tr -d '[[",\]]' | \
  awk 't!=""{t=$1 "/" t;}t==""{t=$1;}END{print t}'

Where $POOL is the "default pool" name (for files) or the metadata poolname (for directories) and $ino is the inode number (in hexadecimal).



Loïc.
--
|   Loīc Tortay <[email protected]>  -     IN2P3 Computing Centre     |
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

Reply via email to