This might be a stretch but do you happen to have a user/fileset/group over it's hard quota or soft quota + grace period? We've had this really upset our cluster before. At least with 3.5 each op that's done against an over quota user/group/fileset results in at least one rpc from the fs manager to every node in the cluster.

Are those waiters from an fs manager node? If so perhaps briefly fire up tracing (/usr/lpp/mmfs/bin/mmtrace start) let it run for ~10 seconds then stop it (/usr/lpp/mmfs/bin/mmtrace stop) then grep for "TRACE_QUOTA" out of the resulting trcrpt file. If you see a bunch of lines that contain:

TRACE_QUOTA: qu.server revoke reply type

that might be what's going on. You can also see the behavior if you look at the output of mmdiag --network on your fs manager nodes and see a bunch of RPC's with all of your cluster node listed as the recipients. Can't recall what the RPC is called that you're looking for, though.

Hope that helps!

-Aaron

On 1/26/17 7:57 PM, Oesterlin, Robert wrote:
OK, I have a sick cluster, and it seems to be tied up with quota related
RPCs like this. Any help in narrowing down what the issue is?



Waiting 3.8729 sec since 19:54:09, monitored, thread 32786 Msg handler
quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 4.3158 sec since 19:54:08, monitored, thread 32771 Msg handler
quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 4.3173 sec since 19:54:08, monitored, thread 35829 Msg handler
quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 4.4619 sec since 19:54:08, monitored, thread 9694 Msg handler
quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 4.4967 sec since 19:54:08, monitored, thread 32357 Msg handler
quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 4.6885 sec since 19:54:08, monitored, thread 32305 Msg handler
quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 4.7123 sec since 19:54:08, monitored, thread 32261 Msg handler
quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 4.7932 sec since 19:54:08, monitored, thread 53409 Msg handler
quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 5.2954 sec since 19:54:07, monitored, thread 32905 Msg handler
quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 5.3058 sec since 19:54:07, monitored, thread 32573 Msg handler
quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 5.3207 sec since 19:54:07, monitored, thread 32397 Msg handler
quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 5.3274 sec since 19:54:07, monitored, thread 32897 Msg handler
quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 5.3343 sec since 19:54:07, monitored, thread 32691 Msg handler
quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 5.3347 sec since 19:54:07, monitored, thread 32364 Msg handler
quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'

Waiting 5.3348 sec since 19:54:07, monitored, thread 32522 Msg handler
quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
'waiting for WA lock'



Bob Oesterlin
Sr Principal Storage Engineer, Nuance
507-269-0413







_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to