You know I hadn't seen the original message when I commented earlier, sorry about that.
It may sound dumb but have you tried tuning downwards as Zdenek says? Try the storage on a single RAID group and a 1MB block size, and then move up from there. We're on practically ancient infrastructure compared to what you have and get throughput equivalent. - Try returning the page pool to 8GB maybe you're prefetching TOO much data (this effect happens on Oracle when it is given too much memory everything slows to a crawl as it tries to keep every DB in memory). - Define your FS on a single RAID group, see what your max performance is (are you sure you're not wrapping any disk/controller access with so many platters being engaged adding latency?) - Tune your block size to 1MB. (the large block size could add latency depending on how the storage is handling that chunk) Once you've got a baseline on a more normal configuration you can scale up one variable at a time and see what gives better or worse performance. At IBM labs we were able to achieve 2.5GB/s on a single thread and 18GB/s to an ESS on a single Linux VM and our GPFS configuration was more or less baseline and with 1MB block size if I remember right. I'm building on Zdenek's answer as it's likely the step I would take in this instance, look for areas where the scale of your configuration could be introducing latencies. Alec On Fri, Feb 9, 2024 at 12:21 AM Zdenek Salvet <[email protected]> wrote: > On Thu, Feb 08, 2024 at 02:59:15PM +0000, Michal Hruška wrote: > > @Uwe > > Using iohist we found out that gpfs is overloading one dm-device (it > took about 500ms to finish IOs). We replaced the "problematic" dm-device > (as we have enough drives to play with) for new one but the overloading > issue just jumped to another dm-device. > > We believe that this behaviour is caused by the gpfs but we are unable > to locate the root cause of it. > > Hello, > this behaviour could be caused by an assymmetry in data paths > of your storage, relatively small imbalance can make request queue > of a slightly slower disk grow seemingly unproportionally. > > In general, I think you need to scale your GPFS parameters down, not up, > in order to force better write clustering and achieve top speed > of rotational disks unless array controllers use huge cache memory. > If you can change your benchmark workload, try synchronous writes > (dd oflag=dsync ...). > > Best regards, > Zdenek Salvet > [email protected] > Institute of Computer Science of Masaryk University, Brno, Czech Republic > and CESNET, z.s.p.o., Prague, Czech Republic > Phone: ++420-549 49 6534 Fax: ++420-541 212 747 > > ---------------------------------------------------------------------------- > Teamwork is essential -- it allows you to blame someone else. > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
