Any idea what this hang condition is all about? I have several nodes all in a 
sort of deadlock, with the following long waiters. I know I’m probably looking 
at a PMR, but – any other clues on what be at work? GPFS 4.1.0.7 on Linux, RH 
6.6.

They all seem to go back to nodes where 'waiting for the flush flag to commit 
metadata’ and 'waiting for WW lock’ are the RPCs in question.

0x7F418C0C07D0 (  18869) waiting 203445.829057195 seconds, 
InodePrefetchWorkerThread: on ThCond 0x7F41FC02A338 (0x7F41FC02A338) 
(MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 10.30.105.68 
<c1n178>
  0x7F418C0C66D0 (  18876) waiting 196174.410095017 seconds, 
InodePrefetchWorkerThread: on ThCond 0x7F40AC8AB798 (0x7F40AC8AB798) 
(MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 10.30.86.102 
<c1n373>
  0x7F9C5C0041F0 (  17394) waiting 218020.428801654 seconds, SyncHandlerThread: 
on ThCond 0x1801970D678 (0xFFFFC9001970D678) (InodeFlushCondVar), reason 
'waiting for the flush flag to commit metadata'
  0x7FEAC0037F10 (  25547) waiting 158003.275282910 seconds, 
InodePrefetchWorkerThread: on ThCond 0x7FEBA400E398 (0x7FEBA400E398) 
(MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 10.30.86.159 
<c2n312>
  0x7F04B0028E80 (  11757) waiting 159426.694691653 seconds, 
InodePrefetchWorkerThread: on ThCond 0x7F0400002A28 (0x7F0400002A28) 
(MsgRecordCondvar), reason 'RPC wait' for tmMsgTellAcquire1 on node 
10.30.43.226 <c1n5>
  0x7F04D0013AA0 (  21781) waiting 157723.199692503 seconds, 
InodePrefetchWorkerThread: on ThCond 0x7F0454010358 (0x7F0454010358) 
(MsgRecordCondvar), reason 'RPC wait' for tmMsgTellAcquire1 on node 
10.30.43.227 <c1n7>
  0x7F6F480041F0 (  12964) waiting 209491.171775225 seconds, SyncHandlerThread: 
on ThCond 0x18022F3C490 (0xFFFFC90022F3C490) (InodeFlushCondVar), reason 
'waiting for the flush flag to commit metadata'
  0x7F03180041F0 (  12338) waiting 212486.480961641 seconds, SyncHandlerThread: 
on ThCond 0x18027186220 (0xFFFFC90027186220) (LkObjCondvar), reason 'waiting 
for WW lock'
  0x7F1EB00041F0 (  12598) waiting 215765.483202551 seconds, SyncHandlerThread: 
on ThCond 0x18026FDFDD0 (0xFFFFC90026FDFDD0) (InodeFlushCondVar), reason 
'waiting for the flush flag to commit metadata'
  0x7F83540041F0 (  12605) waiting 75189.385741859 seconds, SyncHandlerThread: 
on ThCond 0x18021DAA7F8 (0xFFFFC90021DAA7F8) (InodeFlushCondVar), reason 
'waiting for the flush flag to commit metadata'
  0x7FF10C20DA10 (  34836) waiting 202382.680544395 seconds, 
InodePrefetchWorkerThread: on ThCond 0x7FF1640026C8 (0x7FF1640026C8) 
(MsgRecordCondvar), reason 'RPC wait' for tmMsgRevoke on node 10.30.86.77 
<c1n337>
  0x7F839806DBF0 (  49131) waiting 158295.556723453 seconds, 
InodePrefetchWorkerThread: on ThCond 0x7F82B0000FF8 (0x7F82B0000FF8) 
(MsgRecordCondvar), reason 'RPC wait' for tmMsgTellAcquire1 on node 
10.30.43.226 <c2n5>

Bob Oesterlin
Sr Storage Engineer, Nuance Communications
507-269-0413

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to