Hi Kennmeth,
is prefetching off or on at your storage backend?
Raw sequential is very different from GPFS sequential at the storage
device !
GPFS does its own prefetching, the storage would never know what sectors
sequential read at GPFS level maps to at storage level!
Mit freundlichen Grüßen / Kind regards
Dr. Uwe Falke
IT Specialist
High Performance Computing Services / Integrated Technology Services /
Data Center Services
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland
Rathausstr. 7
09111 Chemnitz
Phone: +49 371 6978 2165
Mobile: +49 175 575 2877
E-Mail: [email protected]
-------------------------------------------------------------------------------------------------------------------------------------------
IBM Deutschland Business & Technology Services GmbH / Geschäftsführung:
Andreas Hasse, Thorsten Moehring
Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
HRB 17122
From: Kenneth Waegeman <[email protected]>
To: gpfsug main discussion list <[email protected]>
Date: 04/20/2017 04:53 PM
Subject: Re: [gpfsug-discuss] bizarre performance behavior
Sent by: [email protected]
Hi,
Having an issue that looks the same as this one:
We can do sequential writes to the filesystem at 7,8 GB/s total , which is
the expected speed for our current storage
backend. While we have even better performance with sequential reads on
raw storage LUNS, using GPFS we can only reach 1GB/s in total (each nsd
server seems limited by 0,5GB/s) independent of the number of clients
(1,2,4,..) or ways we tested (fio,dd). We played with blockdev params,
MaxMBps, PrefetchThreads, hyperthreading, c1e/cstates, .. as discussed in
this thread, but nothing seems to impact this read performance.
Any ideas?
Thanks!
Kenneth
On 17/02/17 19:29, Jan-Frode Myklebust wrote:
I just had a similar experience from a sandisk infiniflash system
SAS-attached to s single host. Gpfsperf reported 3,2 Gbyte/s for writes.
and 250-300 Mbyte/s on sequential reads!! Random reads were on the order
of 2 Gbyte/s.
After a bit head scratching snd fumbling around I found out that reducing
maxMBpS from 10000 to 100 fixed the problem! Digging further I found that
reducing prefetchThreads from default=72 to 32 also fixed it, while
leaving maxMBpS at 10000. Can now also read at 3,2 GByte/s.
Could something like this be the problem on your box as well?
-jf
fre. 17. feb. 2017 kl. 18.13 skrev Aaron Knister <[email protected]
:
Well, I'm somewhat scrounging for hardware. This is in our test
environment :) And yep, it's got the 2U gpu-tray in it although even
without the riser it has 2 PCIe slots onboard (excluding the on-board
dual-port mezz card) so I think it would make a fine NSD server even
without the riser.
-Aaron
On 2/17/17 11:43 AM, Simon Thompson (Research Computing - IT Services)
wrote:
Maybe its related to interrupt handlers somehow? You drive the load up
on one socket, you push all the interrupt handling to the other socket
where the fabric card is attached?
Dunno ... (Though I am intrigued you use idataplex nodes as NSD servers,
I assume its some 2U gpu-tray riser one or something !)
Simon
________________________________________
From: [email protected] [
[email protected]] on behalf of Aaron Knister [
[email protected]]
Sent: 17 February 2017 15:52
To: gpfsug main discussion list
Subject: [gpfsug-discuss] bizarre performance behavior
This is a good one. I've got an NSD server with 4x 16GB fibre
connections coming in and 1x FDR10 and 1x QDR connection going out to
the clients. I was having a really hard time getting anything resembling
sensible performance out of it (4-5Gb/s writes but maybe 1.2Gb/s for
reads). The back-end is a DDN SFA12K and I *know* it can do better than
that.
I don't remember quite how I figured this out but simply by running
"openssl speed -multi 16" on the nsd server to drive up the load I saw
an almost 4x performance jump which is pretty much goes against every
sysadmin fiber in me (i.e. "drive up the cpu load with unrelated crap to
quadruple your i/o performance").
This feels like some type of C-states frequency scaling shenanigans that
I haven't quite ironed down yet. I booted the box with the following
kernel parameters "intel_idle.max_cstate=0 processor.max_cstate=0" which
didn't seem to make much of a difference. I also tried setting the
frequency governer to userspace and setting the minimum frequency to
2.6ghz (it's a 2.6ghz cpu). None of that really matters-- I still have
to run something to drive up the CPU load and then performance improves.
I'm wondering if this could be an issue with the C1E state? I'm curious
if anyone has seen anything like this. The node is a dx360 M4
(Sandybridge) with 16 2.6GHz cores and 32GB of RAM.
-Aaron
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss