Thanks, Olaf. I ended up un-setting a bunch of settings that are now
auto-tuned (worker1threads, worker3threads, etc.) and just set
workerthreads as you suggest. That combined with increasing
maxfilestocache to above the max concurrent open file threshold of the
workload got me consistently with in 1%-3% of the performance of the
same storage hardware running btrfs instead of GPFS. I think that's
pretty darned good considering the additional complexity GPFS has over
btrfs of being a clustered filesystem. Plus I now get NFS server
failover for very little effort and without having to deal with corosync
or pacemaker.
-Aaron
On 9/11/17 4:11 AM, Olaf Weiser wrote:
Hi Aaron ,
0,0009 s response time for your meta data IO ... seems to be a very
good/fast storage BE.. which is hard to improve..
you can raise the parallelism a bit for accessing metadata , but if this
will help to improve your "workload" is not assured
The worker3threads parameter specifies the number of threads to use for
inode prefetch. Usually , I would suggest, that you should not touch
single parameters any longer. By the great improvements of the last few
releases.. GPFS can calculate / retrieve the right settings
semi-automatically...
You only need to set simpler "workerThreads" ..
But in your case , you can see, if this more specific value will help
you out .
depending on your blocksize and average filesize .. you may see
additional improvements when tuning nfsPrefetchStrategy , which tells
GPFS to consider all IOs wihtin */N/* blockboundaries as sequential and
starts prefetch
l.b.n.t. set ignoreprefetchLunCount to yes .. (if not already done) .
this helps GPFS to use all available workerThreads
cheers
olaf
From: Aaron Knister <[email protected]>
To: <[email protected]>
Date: 09/11/2017 02:50 AM
Subject: Re: [gpfsug-discuss] tuning parameters question
Sent by: [email protected]
------------------------------------------------------------------------
As an aside, my initial attempt was to use Ganesha via CES but the
performance was significantly worse than CNFS for this workload. The
docs seem to suggest that CNFS performs better for metadata intensive
workloads which certainly seems to fit the bill here.
-Aaron
On 9/10/17 8:43 PM, Aaron Knister wrote:
> Hi All (but mostly Sven),
>
> I stumbled across this great gem:
>
> files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf
>
> and I'm wondering which, if any, of those tuning parameters are still
> relevant with the 4.2.3 code. Specifically for a CNFS cluster. I'm
> exporting a gpfs fs as an NFS root to 1k nodes. The boot storm is
> particularly ugly and the storage doesn't appear to be bottlenecked.
>
> I see a lot of waiters like these:
>
> Waiting 0.0009 sec since 20:41:31, monitored, thread 2881
> InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar),
> reason 'waiting for LX lock'
> Waiting 0.0009 sec since 20:41:31, monitored, thread 26231
> InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar),
> reason 'waiting for LX lock'
> Waiting 0.0009 sec since 20:41:31, monitored, thread 26146
> InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar),
> reason 'waiting for LX lock'
> Waiting 0.0009 sec since 20:41:31, monitored, thread 18637
> InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar),
> reason 'waiting for LX lock'
> Waiting 0.0009 sec since 20:41:31, monitored, thread 25013
> InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar),
> reason 'waiting for LX lock'
> Waiting 0.0009 sec since 20:41:31, monitored, thread 27879
> InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar),
> reason 'waiting for LX lock'
> Waiting 0.0009 sec since 20:41:31, monitored, thread 26553
> InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar),
> reason 'waiting for LX lock'
> Waiting 0.0009 sec since 20:41:31, monitored, thread 25334
> InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar),
> reason 'waiting for LX lock'
> Waiting 0.0009 sec since 20:41:31, monitored, thread 25337
> InodePrefetchWorkerThread: on ThCond 0x1800635A120 (LkObjCondvar),
> reason 'waiting for LX lock'
>
> and I'm wondering if there's anything immediate one would suggest to
> help with that.
>
> -Aaron
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss