Michal,

I think you need to revise your testing method. Let me explain.

Based on my understandings:

3 FE servers and one storage system

~4500 MiB/s from 8 RAID groups using XFS (one XFS per one RAID group) and parallel fio test.
one FS across all 8 RAID groups and we observed performance drop down to ~3300

The test you are running is a non-clustered fs versus a clustered fs.

XFS,

8 XFS filesystems.
Each FS has it own Array and independent Meta, not shared between nodes
Array will see sequential IO for each array and will be able to aggregate IO’s and prefetch on read.
No lock traffic between nodes
Didn’t mention for the FIO runs is this one node or the three nodes with fs’s spread across?

Clustered,

1 Filesystem (fs0) In this case
Parallel Filesystem with shared Meta and access
Lock and Meta traffic across nodes
GPFS Stripes across NSD, 8 in this case.
Each FIO stream will in a less sequential stream at the array level
The LBA will be spread causing the array to work harder due to striped up.
Array logic will not see this a sequential and delivery a much lower performance from a sequential point of view as each stream is intermixed.

What to do,

Try

8 FS with your FIO test like XFS test
1 FS 1 Array and matching 1 FIO ( then x8 result)

PS: You haven’t mention the type of array used? Sometimes the following is important.

Disable prefetch at the array. This causes the array to sometimes over work the backend due to incorrectly fetching data that is never used causing extra io and cache displacement. Ie GPFS aggressively prefetches which triggers the array to do further prefetch and both are not used.

Dale

On 9 Feb 2024, at 13:06, KG <[email protected]> wrote:

Can you check the value of workerThreads? It seems to be default and that is 48 without protocols.

On Fri, Feb 9, 2024 at 3:04 AM dale mac <[email protected]> wrote:
Michal,

I think you need to revise your testing method. Let me explain.

Based on my understandings:

3 FE servers and one storage system

~4500 MiB/s from 8 RAID groups using XFS (one XFS per one RAID group) and parallel fio test.
one FS accross all 8 RAID groups and we observed performance drop down to ~3300

The test you are running is a non-clustered fs versus a clustered fs.

XFS,

8 XFS filesystems.
Each FS has it own Array and independent Meta, not shared between nodes
Array will see sequential IO for each array and will be able to aggregate IO’s and prefetch on write.
No lock traffic between nodes
Didn’t mention for the FIO runs is this one node or the three nodes with fs’s spread across?

Clusterd,

1 Filesystem (fs0) In this case
Parallel Filsystem with shared Meta and access
Lock and Meta traffic across nodes
GPFS Stripes across NSD, 8 in this case.
Each FIO stream will in a less sequential stream at the array level
The LBA will be spread causing the array to work harder
Array logics will not see this a sequential and delivery a much lower performance from a sequential point of view.

What to do,

Try
8 FS with your FIO test like XFS test
1 FS 1 Array and matching 1 FIO ( then x8 result)

PS: You haven’t mention the type of array used? Sometimes the following is important.

Disable prefetch at the array. This causes the array to sometimes over work it backend due to incorrectly fetching data that is never used causing xtra io and cache displacement. Ie GPFS aggressively prefetches which triggers the array to do further prefetch and both are not used.

Dale

On 9 Feb 2024, at 6:30 am, Alec <[email protected]> wrote:

This won't affect your current issue but if you're doing a lot of large sequential IO you may want to consider setting prefetchPct to 40 to 60 percent instead of default 20%. In our environment that has measurable impact, but we have a lot less ram than you do in the pagepool (8g versus 64g).

Also do you have a dedicated meta pool? If not that could be a source of contention. Highly recommend a small pinnable LUN or two as a dedicated meta pool.

Alec

On Thu, Feb 8, 2024, 7:01 AM Michal Hruška <[email protected]> wrote:

@Aaron
Yes, I can confirm that 2MB blocks are transfered over.

@ Jan-Frode
We tried to change multiple parameters, but if you know the best combination for sequential IO, please let me know.

#mmlsconfig
autoload no
dmapiFileHandleSize 32
minReleaseLevel 5.1.9.0
tscCmdAllowRemoteConnections no
ccrEnabled yes
cipherList AUTHONLY
sdrNotifyAuthEnabled yes
pagepool 64G
maxblocksize 16384K
maxMBpS 40000
maxReceiverThreads 32
nsdMaxWorkerThreads 512
nsdMinWorkerThreads 8
nsdMultiQueue 256
nsdSmallThreadRatio 0
nsdThreadsPerQueue 3
prefetchAggressiveness 2
adminMode central

/dev/fs0

@Uwe
Using iohist we found out that gpfs is overloading one dm-device (it took about 500ms to finish IOs). We replaced the „problematic“ dm-device (as we have enough drives to play with) for new one but the overloading issue just jumped to another dm-device.
We believe that this behaviour is caused by the gpfs but we are unable to locate the root cause of it.

Best,
Michal

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Re: [gpfsug-discuss] sequential I/O write - performance

Reply via email to