[email protected]




Michal,


I think you need to revise your testing method. Let me explain.


Based on my understandings:


3 FE servers and one storage system



  • ~4500 MiB/s from 8 RAID groups using XFS (one XFS per one RAID group) and parallel fio test.
  • one FS across all 8 RAID groups and we observed performance drop down to ~3300



The test you are running is a non-clustered fs versus a clustered fs.


XFS,


  • 8 XFS filesystems.
  • Each FS has it own Array and independent Meta, not shared between nodes
  • Array will see sequential IO for each array and will be able to aggregate IO’s and prefetch on read.
  • No lock traffic between nodes
  • Didn’t mention for the FIO runs is this one node or the three nodes with fs’s spread across?


Clustered,


  • 1 Filesystem (fs0) In this case
  • Parallel Filesystem with shared Meta and access
  • Lock and Meta traffic across nodes
  • GPFS Stripes across NSD, 8 in this case.  
  • Each FIO stream will in a less sequential stream at the array level
  • The LBA will be spread causing the array to work harder due to striped up.
  • Array logic will not see this a sequential and delivery a much lower performance from a sequential point of view as each stream is intermixed.


 

What to do,


Try 


  1. 8 FS with your FIO test like XFS test
  2. 1 FS 1 Array and matching 1 FIO ( then x8 result)



PS: You haven’t mention the type of array used? Sometimes the following is important.



  • Disable prefetch at the array.  This causes the array to sometimes over work the backend due to incorrectly fetching data that is never used causing extra io and cache displacement.  Ie GPFS aggressively prefetches which triggers the array to do further prefetch and both are not used.



Dale


On 9 Feb 2024, at 13:06, KG <[email protected]> wrote:


Can you check the value of workerThreads? It seems to be default and that is 48 without protocols.

On Fri, Feb 9, 2024 at 3:04 AM dale mac <[email protected]> wrote:
Michal,

I think you need to revise your testing method. Let me explain.

Based on my understandings:

3 FE servers and one storage system


  • ~4500 MiB/s from 8 RAID groups using XFS (one XFS per one RAID group) and parallel fio test.
  •  one FS accross all 8 RAID groups and we observed performance drop down to ~3300


The test you are running is a non-clustered fs versus a clustered fs.


XFS,


  • 8 XFS filesystems.
  • Each FS has it own Array and independent Meta, not shared between nodes
  • Array will see sequential IO for each array and will be able to aggregate IO’s and prefetch on write.
  • No lock traffic between nodes
  • Didn’t mention for the FIO runs is this one node or the three nodes with fs’s spread across?

Clusterd,

  • 1 Filesystem (fs0) In this case
  • Parallel Filsystem with shared Meta and access
  • Lock and Meta traffic across nodes
  • GPFS Stripes across NSD, 8 in this case.  
  • Each FIO stream will in a less sequential stream at the array level
  • The LBA will be spread causing the array to work harder
  • Array logics will not see this a sequential and delivery a much lower performance from a sequential point of view.

 



What to do,


Try 

  1. 8 FS with your FIO test like XFS test
  2. 1 FS 1 Array and matching 1 FIO ( then x8 result)



PS: You haven’t mention the type of array used? Sometimes the following is important.


  • Disable prefetch at the array.  This causes the array to sometimes over work it backend due to incorrectly fetching data that is never used causing xtra io and cache displacement.  Ie GPFS aggressively prefetches which triggers the array to do further prefetch and both are not used.


Dale

On 9 Feb 2024, at 6:30 am, Alec <[email protected]> wrote:

This won't affect your current issue but if you're doing a lot of large sequential IO you may want to consider setting prefetchPct to 40 to 60 percent instead of default 20%.  In our environment that has measurable impact, but we have a lot less ram than you do in the pagepool (8g versus 64g).

Also do you have a dedicated meta pool?  If not that could be a source of contention.  Highly recommend a small pinnable LUN or two as a dedicated meta pool.

Alec

On Thu, Feb 8, 2024, 7:01 AM Michal Hruška <[email protected]> wrote:

@Aaron

Yes, I can confirm that 2MB blocks are transfered over.

@ Jan-Frode

We tried to change multiple parameters, but if you know the best combination for sequential IO, please let me know.

 

#mmlsconfig

autoload no

dmapiFileHandleSize 32

minReleaseLevel 5.1.9.0

tscCmdAllowRemoteConnections no

ccrEnabled yes

cipherList AUTHONLY

sdrNotifyAuthEnabled yes

pagepool 64G

maxblocksize 16384K

maxMBpS 40000

maxReceiverThreads 32

nsdMaxWorkerThreads 512

nsdMinWorkerThreads 8

nsdMultiQueue 256

nsdSmallThreadRatio 0

nsdThreadsPerQueue 3

prefetchAggressiveness 2

adminMode central

 

/dev/fs0

@Uwe

Using iohist we found out that gpfs is overloading one dm-device (it took about 500ms to finish IOs). We replaced the „problematic“ dm-device (as we have enough drives to play with) for new one but the overloading issue just jumped to another dm-device.
We believe that this behaviour is caused by the gpfs but we are unable to locate the root cause of it.

 

Best,
Michal

 

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org

Reply via email to