Right for the example from Ryan(and according to the thread name, you know that it is writing to a file or directory), but for other cases, it may take more steps to figure out what access to which file is causing the long waiters(i.e., when mmap is being used on some nodes, or token revoke pending from some node, and etc.).
Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479 . If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Ryan Novosielski <[email protected]> To: gpfsug main discussion list <[email protected]> Date: 2019/10/18 09:18 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] waiters and files causing waiters Sent by: [email protected] Found my notes on this; very similar to what Behrooz was saying. This here is from “mmfsadm dump waiters,selected_files”; as you can see here, we’re looking at thread 29168. Apparently below, “inodeFlushHolder” corresponds to that same thread in the case I was looking at. You could then look up the inode with “tsfindinode -i <inode> <fsname>”, so like for the below, "tsfindinode -i 41538053 /gpfs/cache” on our system. ===== dump waiters ==== Current time 2019-05-01_13:48:26-0400 Waiting 0.1669 sec since 13:48:25, monitored, thread 29168 FileBlockWriteFetchHandlerThread: on ThCond 0x7F55E40014C8 (MsgRecordCondvar), reason 'RPC wait' for quotaMsgRequestShare on node 192.168.33.7 <c1n1> ===== dump selected_files ===== Current time 2019-05-01_13:48:36-0400 ... OpenFile: 4E044E5B0601A8C0:000000000279D205:0000000000000000 @ 0x1806AC5EAC8 cach 1 ref 1 hc 2 tc 6 mtx 0x1806AC5EAF8 Inode: valid eff token xw @ 0x1806AC5EC70, ctMode xw seq 170823 lock state [ wf: 1 ] x [] flags [ ] Mnode: valid eff token xw @ 0x1806AC5ECC0, ctMode xw seq 170823 DMAPI: invalid eff token nl @ 0x1806AC5EC20, ctMode nl seq 170821 SMBOpen: valid eff token (A:RMA D: ) @ 0x1806AC5EB50, ctMode (A:RMA D: ) seq 170823 lock state [ M(2) D: ] x [] flags [ ] SMBOpLk: valid eff token wf @ 0x1806AC5EBC0, ctMode wf Flags 0x30 (pfro+pfxw) seq 170822 BR: @ 0x1806AC5ED20, ctMode nl Flags 0x10 (pfro) seq 170823 treeP 0x18016189C08 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> Fcntl: @ 0x1806AC5ED48, ctMode nl Flags 0x30 (pfro+pfxw) seq 170823 treeP 0x18031A5E3F8 C btFastTrack 0 1 ranges mode RO/XW: BLK [0,INF] mode XW node <403> inode 41538053 snap 0 USERFILE nlink 1 genNum 0x3CC2743F mode 0200100600: -rw------- tmmgr node <c1n1> (other) metanode <c1n403> (me) fail+panic count -1 flags 0x0, remoteStart 0 remoteCnt 0 localCnt 177 lastFrom 65535 switchCnt 0 locks held in mode xw: 0x1806AC5F238: 0x0-0xFFF tid 15954 gbl 0 mode xw rel 0 BRL nXLocksOrRelinquishes 285 vfsReference 1 dioCount 0 dioFlushNeeded 1 dioSkipCounter 0 dioReentryThreshold 0.000000 hasWriterInstance 1 inodeFlushFlag 1 inodeFlushHolder 29168 openInstCount 1 metadataFlushCount 2, metadataFlushWaiters 0/0, metadataCommitVersion 1 bufferListCount 1 bufferListChangeCount 3 dirty status: flushed dirtiedSyncNum 1477623 SMB oplock state: nWriters 1 indBlockDeallocLock: sharedLockWord 1 exclLockWord 0 upgradeWaitingS_W 0 upgradeWaitingW_X 0 inodeValid 1 objectVersion 240 flushVersion 8086700 mnodeChangeCount 1 block size code 5 (32 subblocksPerFileBlock) dataBytesPerFileBlock 4194304 fileSize 0 synchedFileSize 0 indirectionLevel 1 atime 1556732911.496160000 mtime 1556732911.496479000 ctime 1556732911.496479000 crtime 1556732911.496160000 owner uid 169589 gid 169589 > On Oct 10, 2019, at 4:43 PM, Damir Krstic <[email protected]> wrote: > > is it possible via some set of mmdiag --waiters or mmfsadm dump ? to figure out which files or directories access (whether it's read or write) is causing long-er waiters? > > in all my looking i have not been able to get that information out of various diagnostic commands. > > thanks, > damir > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=VdmIfneKWlidYoO2I90hBuZJ2VxXu8L8oq86E7zyh8Q&s=dkQrCzMmxeh6tu0UpPgSIphmRwcBiSpL7QbZPw5RNtI&e=
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
