All,

These messages like

[W] ------------------[GPFS Critical Thread Watchdog]------------------

indicate that a “critical thread”, in this case the lease thread, was 
apparently blocked for longer than expected. This is usually not caused by 
delays in the network, but possibly by excessive CPU load, blockage while 
accessing the local file system, or possible mutex contention.

Do you have other samples of the message, with a more complete stack trace?   
Or was the instance below the only one?

  Felipe

----
Felipe Knop        [email protected]
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601


From: gpfsug-discuss <[email protected]> on behalf of Walter 
Sklenka <[email protected]>
Reply-To: gpfsug main discussion list <[email protected]>
Date: Tuesday, February 14, 2023 at 10:49 AM
To: "[email protected]" <[email protected]>
Subject: [EXTERNAL] Re: [gpfsug-discuss] Reasons for DiskLeaseThread Overloaded

Hi! I started with 5. 1. 6. 0 and now am at [root@ ogpfs1 ~]# mmfsadm dump 
version Dump level: verbose Build branch "5. 1. 6. 1 ". the messages started 
from the beginning From: gpfsug-discuss <gpfsug-discuss-bounces@ gpfsug. org> On
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hi!
I started with 5.1.6.0 and now am at [root@ogpfs1 ~]# mmfsadm dump version
Dump level: verbose
Build branch "5.1.6.1 ".

the messages started  from the beginning



From: gpfsug-discuss <[email protected]> On Behalf Of Christian 
Vieser
Sent: Dienstag, 14. Februar 2023 15:34
To: [email protected]
Subject: Re: [gpfsug-discuss] Reasons for DiskLeaseThread Overloaded


What version of Spectrum Scale is running there? Do these errors appear since 
your last version update?
Am 14.02.23 um 14:09 schrieb Walter Sklenka:
Dear Collegues!
May I ask if anyone has a hint what could be the reason for Critical Thread 
Watchdog warnings for Disk Leases Threads?
Is this a “local node” Problem or a network problem ?
I see these messages sometimes arriving when NSD Servers which also serve as 
NFS servers when they get under heavy NFS load



Following is an excerpt from mmfs.log.latest

2023-02-14_12:06:53.235+0100: [N] Disk lease period expired 0.040 seconds ago 
in cluster xxx-cluster. Attempting to reacquire the lease.
2023-02-14_12:06:53.600+0100: [W] ------------------[GPFS Critical Thread 
Watchdog]------------------
2023-02-14_12:06:53.600+0100: [W] PID: 7294 State: R (DiskLeaseThread) is 
overloaded for more than 8 seconds
2023-02-14_12:06:53.600+0100: [W]  counter: 0 (mark-idle: 0 mark-active: 0 
pre-work: 0 post-work: 0) sched: (nvcsw: 0 nivcsw: 8)
2023-02-14_12:06:53.600+0100: [W] Call Trace(PID: 7294):
2023-02-14_12:06:53.600+0100: [W] #0: 0x000055CABDF49521 
BaseMutexClass::release() + 0x12 at ??:0
2023-02-14_12:06:53.600+0100: [W] #1: 0xB1557721BBABD900 _etext + 
0xB154F7E646041C0E at ??:0
2023-02-14_12:07:09.554+0100: [N] Disk lease reacquired in cluster xxx-cluster.
2023-02-14_12:07:09.554+0100: [N] Disk lease period expired 5.680 seconds ago 
in cluster xxx-cluster. Attempting to reacquire the lease.
2023-02-14_12:07:11.605+0100: [N] Disk lease reacquired in cluster xxx-cluster.
2023-02-14_12:10:55.990+0100: [I] Command: mmlspool /dev/fs4vm all -L -Y
2023-02-14_12:10:55.990+0100: [I] Command: successful mmlspool /dev/fs4vm all 
-L -Y
2023-02-14_12:30:58.756+0100: [I] Command: mmlspool /dev/fs4vm all -L -Y
2023-02-14_12:30:58.756+0100: [I] Command: successful mmlspool /dev/fs4vm all 
-L -Y
2023-02-14_13:10:55.988+0100: [I] Command: mmlspool /dev/fs4vm all -L -Y
2023-02-14_13:10:55.989+0100: [I] Command: successful mmlspool /dev/fs4vm all 
-L -Y
2023-02-14_13:21:40.892+0100: [N] Node 10.20.30.2 (ogpfs2-hs.local) lease 
renewal is overdue. Pinging to check if it is alive
2023-02-14_13:21:40.892+0100: [I] The TCP connection to IP address 10.20.30.2 
ogpfs2-hs.local <c0n1>:[1] (socket 106) state: state=1 ca_state=0 snd_cwnd=10 
snd_ssthresh=2147483647 unacked=0 probes=0 backoff=0 retransmits=0 rto=201000 
rcv_ssthresh=1219344 rtt=121 rttvar=69 sacked=0 retrans=0 reordering=3 lost=0
2023-02-14_13:22:00.220+0100: [N] Disk lease period expired 0.010 seconds ago 
in cluster xxx-cluster. Attempting to reacquire the lease.
2023-02-14_13:22:08.298+0100: [N] Disk lease reacquired in cluster xxx-cluster.
2023-02-14_13:30:58.760+0100: [I] Command: mmlspool /dev/fs4vm all -L -Y
2023-02-14_13:30:58.760+0100: [I] Command: successful mmlspool /dev/fs4vm all 
-L -Y
Mit freundlichen Grüßen
Walter Sklenka
Technical Consultant



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to