What version of Spectrum Scale is running there? Do these errors appear since your last version update?

Am 14.02.23 um 14:09 schrieb Walter Sklenka:

Dear Collegues!

May I ask if anyone has a hint what could be the reason for Critical Thread Watchdog warnings for Disk Leases Threads?

Is this a “local node” Problem or a network problem ?

I see these messages sometimes arriving when NSD Servers which also serve as NFS servers when they get under heavy NFS load

Following is an excerpt from mmfs.log.latest

2023-02-14_12:06:53.235+0100: [N] Disk lease period expired 0.040 seconds ago in cluster xxx-cluster. Attempting to reacquire the lease.

2023-02-14_12:06:53.600+0100: [W] ------------------[GPFS Critical Thread Watchdog]------------------

2023-02-14_12:06:53.600+0100: [W] PID: 7294 State: R (DiskLeaseThread) is overloaded for more than 8 seconds

2023-02-14_12:06:53.600+0100: [W] counter: 0 (mark-idle: 0 mark-active: 0 pre-work: 0 post-work: 0) sched: (nvcsw: 0 nivcsw: 8)

2023-02-14_12:06:53.600+0100: [W] Call Trace(PID: 7294):

2023-02-14_12:06:53.600+0100: [W] #0: 0x000055CABDF49521 BaseMutexClass::release() + 0x12 at ??:0

2023-02-14_12:06:53.600+0100: [W] #1: 0xB1557721BBABD900 _etext + 0xB154F7E646041C0E at ??:0

2023-02-14_12:07:09.554+0100: [N] Disk lease reacquired in cluster xxx-cluster.

2023-02-14_12:07:09.554+0100: [N] Disk lease period expired 5.680 seconds ago in cluster xxx-cluster. Attempting to reacquire the lease.

2023-02-14_12:07:11.605+0100: [N] Disk lease reacquired in cluster xxx-cluster.

2023-02-14_12:10:55.990+0100: [I] Command: mmlspool /dev/fs4vm all -L -Y

2023-02-14_12:10:55.990+0100: [I] Command: successful mmlspool /dev/fs4vm all -L -Y

2023-02-14_12:30:58.756+0100: [I] Command: mmlspool /dev/fs4vm all -L -Y

2023-02-14_12:30:58.756+0100: [I] Command: successful mmlspool /dev/fs4vm all -L -Y

2023-02-14_13:10:55.988+0100: [I] Command: mmlspool /dev/fs4vm all -L -Y

2023-02-14_13:10:55.989+0100: [I] Command: successful mmlspool /dev/fs4vm all -L -Y

2023-02-14_13:21:40.892+0100: [N] Node 10.20.30.2 (ogpfs2-hs.local) lease renewal is overdue. Pinging to check if it is alive

2023-02-14_13:21:40.892+0100: [I] The TCP connection to IP address 10.20.30.2 ogpfs2-hs.local <c0n1>:[1] (socket 106) state: state=1 ca_state=0 snd_cwnd=10 snd_ssthresh=2147483647 unacked=0 probes=0 backoff=0 retransmits=0 rto=201000 rcv_ssthresh=1219344 rtt=121 rttvar=69 sacked=0 retrans=0 reordering=3 lost=0

2023-02-14_13:22:00.220+0100: [N] Disk lease period expired 0.010 seconds ago in cluster xxx-cluster. Attempting to reacquire the lease.

2023-02-14_13:22:08.298+0100: [N] Disk lease reacquired in cluster xxx-cluster.

2023-02-14_13:30:58.760+0100: [I] Command: mmlspool /dev/fs4vm all -L -Y

2023-02-14_13:30:58.760+0100: [I] Command: successful mmlspool /dev/fs4vm all -L -Y

Mit freundlichen Grüßen
*/Walter Sklenka/*
*/Technical Consultant/*


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to