Michal,

I think you need to revise your testing method. Let me explain.

Based on my understandings:

3 FE servers and one storage system

~4500 MiB/s from 8 RAID groups using XFS (one XFS per one RAID group) and 
parallel fio test.
 one FS accross all 8 RAID groups and we observed performance drop down to ~3300

The test you are running is a non-clustered fs versus a clustered fs.

XFS,

8 XFS filesystems.
Each FS has it own Array and independent Meta, not shared between nodes
Array will see sequential IO for each array and will be able to aggregate IO’s 
and prefetch on write.
No lock traffic between nodes
Didn’t mention for the FIO runs is this one node or the three nodes with fs’s 
spread across?

Clusterd,

1 Filesystem (fs0) In this case
Parallel Filsystem with shared Meta and access
Lock and Meta traffic across nodes
GPFS Stripes across NSD, 8 in this case.  
Each FIO stream will in a less sequential stream at the array level
The LBA will be spread causing the array to work harder
Array logics will not see this a sequential and delivery a much lower 
performance from a sequential point of view.
 


What to do,

Try 
8 FS with your FIO test like XFS test
1 FS 1 Array and matching 1 FIO ( then x8 result)



PS: You haven’t mention the type of array used? Sometimes the following is 
important.

Disable prefetch at the array.  This causes the array to sometimes over work it 
backend due to incorrectly fetching data that is never used causing xtra io and 
cache displacement.  Ie GPFS aggressively prefetches which triggers the array 
to do further prefetch and both are not used.

Dale

> On 9 Feb 2024, at 6:30 am, Alec <[email protected]> wrote:
> 
> This won't affect your current issue but if you're doing a lot of large 
> sequential IO you may want to consider setting prefetchPct to 40 to 60 
> percent instead of default 20%.  In our environment that has measurable 
> impact, but we have a lot less ram than you do in the pagepool (8g versus 
> 64g).
> 
> Also do you have a dedicated meta pool?  If not that could be a source of 
> contention.  Highly recommend a small pinnable LUN or two as a dedicated meta 
> pool.
> 
> Alec
> 
> On Thu, Feb 8, 2024, 7:01 AM Michal Hruška <[email protected] 
> <mailto:[email protected]>> wrote:
>> @Aaron
>> 
>> Yes, I can confirm that 2MB blocks are transfered over.
>> 
>> 
>> @ Jan-Frode
>> 
>> We tried to change multiple parameters, but if you know the best combination 
>> for sequential IO, please let me know.
>> 
>>  
>> 
>> #mmlsconfig
>> 
>> autoload no
>> 
>> dmapiFileHandleSize 32
>> 
>> minReleaseLevel 5.1.9.0
>> 
>> tscCmdAllowRemoteConnections no
>> 
>> ccrEnabled yes
>> 
>> cipherList AUTHONLY
>> 
>> sdrNotifyAuthEnabled yes
>> 
>> pagepool 64G
>> 
>> maxblocksize 16384K
>> 
>> maxMBpS 40000
>> 
>> maxReceiverThreads 32
>> 
>> nsdMaxWorkerThreads 512
>> 
>> nsdMinWorkerThreads 8
>> 
>> nsdMultiQueue 256
>> 
>> nsdSmallThreadRatio 0
>> 
>> nsdThreadsPerQueue 3
>> 
>> prefetchAggressiveness 2
>> 
>> adminMode central
>> 
>>  
>> 
>> /dev/fs0
>> 
>> 
>> @Uwe
>> 
>> Using iohist we found out that gpfs is overloading one dm-device (it took 
>> about 500ms to finish IOs). We replaced the „problematic“ dm-device (as we 
>> have enough drives to play with) for new one but the overloading issue just 
>> jumped to another dm-device.
>> We believe that this behaviour is caused by the gpfs but we are unable to 
>> locate the root cause of it.
>> 
>>  
>> 
>> Best,
>> Michal
>> 
>>  
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org <http://gpfsug.org/>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to