It absolutely does, thanks Olaf!
The tasks running on these nodes are running on 63 other nodes and
generating ~60K iop/s of metadata writes and I *think* about the same in
reads. Do you think that could be contributing to the higher waiter
times? I'm not sure quite what the job is up to. It's seemingly doing
very little data movement, the cpu %used is very low but the load is
rather high.
-Aaron
On 10/15/16 11:23 AM, Olaf Weiser wrote:
from your file system configuration .. mmfs <dev> -L you'll find the
size of the LOG
since release 4.x ..you can change it, but you need to re-mount the FS
on every client , to make the change effective ...
when a clients initiate writes/changes to GPFS it needs to update its
changes to the log - if this narrows a certain filling degree, GPFS
triggers so called logWrapThreads to write content to disk and so free
space
with your given numbers ... double digit [ms] waiter times .. you fs
get's probably slowed down.. and there's something suspect with the
storage, because LOG-IOs are rather small and should not take that long
to give you an example from a healthy environment... the IO times are so
small, that you usually don't see waiters for this..
I/O start time RW Buf type disk:sectorNum nSec time ms tag1
tag2 Disk UID typ NSD node context thread
--------------- -- ----------- ----------------- ----- -------
--------- --------- ------------------ --- --------------- ---------
----------
06:23:32.358851 W logData 2:524306424 8 0.439
0 0 C0A70D08:57CF40D1 cli 192.167.20.17 LogData
SGExceptionLogBufferFullThread
06:23:33.576367 W logData 1:524257280 8 0.646
0 0 C0A70D08:57CF40D0 cli 192.167.20.16 LogData
SGExceptionLogBufferFullThread
06:23:32.358851 W logData 2:524306424 8 0.439
0 0 C0A70D08:57CF40D1 cli 192.167.20.17 LogData
SGExceptionLogBufferFullThread
06:23:33.576367 W logData 1:524257280 8 0.646
0 0 C0A70D08:57CF40D0 cli 192.167.20.16 LogData
SGExceptionLogBufferFullThread
06:23:32.212426 W iallocSeg 1:524490048 64 0.733
2 245 C0A70D08:57CF40D0 cli 192.167.20.16 Logwrap
LogWrapHelperThread
06:23:32.212412 W logWrap 2:524552192 8 0.755
0 179200 C0A70D08:57CF40D1 cli 192.167.20.17 Logwrap
LogWrapHelperThread
06:23:32.212432 W logWrap 2:525162760 8 0.737
0 125473 C0A70D08:57CF40D1 cli 192.167.20.17 Logwrap
LogWrapHelperThread
06:23:32.212416 W iallocSeg 2:524488384 64 0.763
2 347 C0A70D08:57CF40D1 cli 192.167.20.17 Logwrap
LogWrapHelperThread
06:23:32.212414 W logWrap 2:525266944 8 2.160
0 177664 C0A70D08:57CF40D1 cli 192.167.20.17 Logwrap
LogWrapHelperThread
hope this helps ..
Mit freundlichen Grüßen / Kind regards
Olaf Weiser
EMEA Storage Competence Center Mainz, German / IBM Systems, Storage
Platform,
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
IBM Allee 1
71139 Ehningen
Phone: +49-170-579-44-66
E-Mail: olaf.wei...@de.ibm.com
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter,
Norbert Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940
From: Aaron Knister <aaron.s.knis...@nasa.gov>
To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date: 10/15/2016 07:23 AM
Subject: [gpfsug-discuss] SGExceptionLogBufferFullThread waiter
Sent by: gpfsug-discuss-boun...@spectrumscale.org
------------------------------------------------------------------------
I've got a node that's got some curious waiters on it (see below). Could
someone explain what the "SGExceptionLogBufferFullThread" waiter means?
Thanks!
-Aaron
=== mmdiag: waiters ===
0x7FFFF040D600 waiting 0.038822715 seconds,
SGExceptionLogBufferFullThread: on ThCond 0x7FFFDBB07628
(0x7FFFDBB07628) (parallelWaitCond), reason 'wait for parallel write'
for NSD I/O completion on node 10.1.53.5 <c0n20>
0x7FFFE83F3D60 waiting 0.039629116 seconds, CleanBufferThread: on ThCond
0x17B1488 (0x17B1488) (MsgRecordCondvar), reason 'RPC wait' for NSD I/O
completion on node 10.1.53.7 <c0n22>
0x7FFFE8373A90 waiting 0.038921480 seconds, CleanBufferThread: on ThCond
0x7FFFCD2B4E30 (0x7FFFCD2B4E30) (LogFileBufferDescriptorCondvar), reason
'force wait on force active buffer write'
0x42CD9B0 waiting 0.028227004 seconds, CleanBufferThread: on ThCond
0x7FFFCD2B4E30 (0x7FFFCD2B4E30) (LogFileBufferDescriptorCondvar), reason
'force wait for buffer write to complete'
0x7FFFE0F0EAD0 waiting 0.027864343 seconds, CleanBufferThread: on ThCond
0x7FFFDC0EEA88 (0x7FFFDC0EEA88) (MsgRecordCondvar), reason 'RPC wait'
for NSD I/O completion on node 10.1.53.7 <c0n22>
0x1575560 waiting 0.028045975 seconds, RemoveHandlerThread: on ThCond
0x18020CE4E08 (0xFFFFC90020CE4E08) (LkObjCondvar), reason 'waiting for
LX lock'
0x1570560 waiting 0.038724949 seconds, CreateHandlerThread: on ThCond
0x18020CE50A0 (0xFFFFC90020CE50A0) (LkObjCondvar), reason 'waiting for
LX lock'
0x1563D60 waiting 0.073919918 seconds, RemoveHandlerThread: on ThCond
0x180235F6440 (0xFFFFC900235F6440) (LkObjCondvar), reason 'waiting for
LX lock'
0x1561560 waiting 0.054854513 seconds, RemoveHandlerThread: on ThCond
0x1802292D200 (0xFFFFC9002292D200) (LkObjCondvar), reason 'waiting for
LX lock'
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss