Hi, odd prefetch strategy would affect read performance, but write latency is claimed to be even worse ... Have you simply checked what the actual IO performance of the v5k box under that load is and how it compares to its nominal performance and that of its disks? how is the storage organised? how many LUNs/NSDs, what RAID code (V5k cannot do declustered RAID, can it?), any thin provisioning or other gimmicks in the game? what IO sizes ? tons of things to look at.
Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist Hybrid Cloud Infrastructure / Technology Consulting & Implementation Services +49 175 575 2877 Mobile Rochlitzer Str. 19, 09111 Chemnitz, Germany uwefa...@de.ibm.com IBM Services IBM Data Privacy Statement IBM Deutschland Business & Technology Services GmbH Geschäftsführung: Sven Schooss, Stefan Hierl Sitz der Gesellschaft: Ehningen Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Jan-Frode Myklebust <janfr...@tanso.net> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date: 28/05/2021 19:50 Subject: [EXTERNAL] Re: [gpfsug-discuss] Long IO waiters and IBM Storwize V5030 Sent by: gpfsug-discuss-boun...@spectrumscale.org One thing to check: Storwize/SVC code will *always* guess wrong on prefetching for GPFS. You can see this with having a lot higher read data throughput on mdisk vs. on on vdisks in the webui. To fix it, disable cache_prefetch with "chsystem -cache_prefetch off". This being a global setting, you probably only should set it if the system is only used for GPFS. -jf On Fri, May 28, 2021 at 5:58 PM Saula, Oluwasijibomi < oluwasijibomi.sa...@ndsu.edu> wrote: Hi Folks, So, we are experiencing some very long IO waiters in our GPFS cluster: # mmdiag --waiters === mmdiag: waiters === Waiting 17.3823 sec since 10:41:01, monitored, thread 21761 NSDThread: for I/O completion Waiting 16.6140 sec since 10:41:02, monitored, thread 21730 NSDThread: for I/O completion Waiting 15.3004 sec since 10:41:03, monitored, thread 21763 NSDThread: for I/O completion Waiting 15.2013 sec since 10:41:03, monitored, thread 22175 However, GPFS support is pointing to our IBM Storwize V5030 disk system as the source of latency. Unfortunately, we don't have paid support for the system so we are polling for anyone who might be able to assist. Does anyone by chance have any experience with IBM Storwize V5030 or possess a problem determination guide for the V5030? We've briefly reviewed the V5030 management portal, but we still haven't identified a cause for the increased latencies (i.e. read ~129ms, write ~198ms). Granted, we have some heavy client workloads, yet we seem to experience this drastic drop in performance every couple of months, probably exacerbated by heavy IO demands. Any assistance would be much appreciated. Thanks, Oluwasijibomi (Siji) Saula HPC Systems Administrator / Information Technology Research 2 Building 220B / Fargo ND 58108-6050 p: 701.231.7749 / www.ndsu.edu _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss