[email protected]
Michal,
I think you need to revise your testing method. Let me explain.
Based on my understandings:
3 FE servers and one storage system
- ~4500 MiB/s from 8 RAID groups using XFS (one XFS per one RAID group) and parallel fio test.
- one FS across all 8 RAID groups and we observed performance drop down to ~3300
The test you are running is a non-clustered fs versus a clustered fs.
XFS,
- 8 XFS filesystems.
- Each FS has it own Array and independent Meta, not shared between nodes
- Array will see sequential IO for each array and will be able to aggregate IO’s and prefetch on read.
- No lock traffic between nodes
- Didn’t mention for the FIO runs is this one node or the three nodes with fs’s spread across?
Clustered,
- 1 Filesystem (fs0) In this case
- Parallel Filesystem with shared Meta and access
- Lock and Meta traffic across nodes
- GPFS Stripes across NSD, 8 in this case.
- Each FIO stream will in a less sequential stream at the array level
- The LBA will be spread causing the array to work harder due to striped up.
- Array logic will not see this a sequential and delivery a much lower performance from a sequential point of view as each stream is intermixed.
What to do,
Try
- 8 FS with your FIO test like XFS test
- 1 FS 1 Array and matching 1 FIO ( then x8 result)
PS: You haven’t mention the type of array used? Sometimes the following is important.
- Disable prefetch at the array. This causes the array to sometimes over work the backend due to incorrectly fetching data that is never used causing extra io and cache displacement. Ie GPFS aggressively prefetches which triggers the array to do further prefetch and both are not used.
Dale Can you check the value of workerThreads? It seems to be default and that is 48 without protocols. Michal,
I think you need to revise your testing method. Let me explain.
Based on my understandings:
3 FE servers and one storage system
- ~4500 MiB/s from 8 RAID groups using XFS (one XFS per one RAID group) and parallel fio test.
- one FS accross all 8 RAID groups and we observed performance drop down to ~3300
The test you are running is a non-clustered fs versus a clustered fs.
XFS,
- 8 XFS filesystems.
- Each FS has it own Array and independent Meta, not shared between nodes
- Array will see sequential IO for each array and will be able to aggregate IO’s and prefetch on write.
- No lock traffic between nodes
- Didn’t mention for the FIO runs is this one node or the three nodes with fs’s spread across?
Clusterd,
- 1 Filesystem (fs0) In this case
- Parallel Filsystem with shared Meta and access
- Lock and Meta traffic across nodes
- GPFS Stripes across NSD, 8 in this case.
- Each FIO stream will in a less sequential stream at the array level
- The LBA will be spread causing the array to work harder
- Array logics will not see this a sequential and delivery a much lower performance from a sequential point of view.
What to do,
Try - 8 FS with your FIO test like XFS test
- 1 FS 1 Array and matching 1 FIO ( then x8 result)
PS: You haven’t mention the type of array used? Sometimes the following is important.
- Disable prefetch at the array. This causes the array to sometimes over work it backend due to incorrectly fetching data that is never used causing xtra io and cache displacement. Ie GPFS aggressively prefetches which triggers the array to do further prefetch and both are not used.
Dale
This won't affect your current issue but if you're doing a lot of large sequential IO you may want to consider setting prefetchPct to 40 to 60 percent instead of default 20%. In our environment that has measurable impact, but we have a lot less ram than you do in the pagepool (8g versus 64g).
Also do you have a dedicated meta pool? If not that could be a source of contention. Highly recommend a small pinnable LUN or two as a dedicated meta pool.
Alec
@Aaron Yes, I can confirm that 2MB blocks are transfered over.
@ Jan-Frode We tried to change multiple parameters, but if you know the best combination for sequential IO, please let me know. #mmlsconfig autoload no dmapiFileHandleSize 32 minReleaseLevel 5.1.9.0 tscCmdAllowRemoteConnections no ccrEnabled yes cipherList AUTHONLY sdrNotifyAuthEnabled yes pagepool 64G maxblocksize 16384K maxMBpS 40000 maxReceiverThreads 32 nsdMaxWorkerThreads 512 nsdMinWorkerThreads 8 nsdMultiQueue 256 nsdSmallThreadRatio 0 nsdThreadsPerQueue 3 prefetchAggressiveness 2 adminMode central /dev/fs0
@Uwe Using iohist we found out that gpfs is overloading one dm-device (it took about 500ms to finish IOs). We replaced the „problematic“ dm-device (as we have enough drives to play with) for new one but the overloading issue just jumped to another
dm-device.
We believe that this behaviour is caused by the gpfs but we are unable to locate the root cause of it.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
_______________________________________________gpfsug-discuss mailing listgpfsug-discuss at gpfsug.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
|
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org