Have disabled prefetching on the FS7200? chsystem -cache_prefetch off
-jf ons. 14. feb. 2024 kl. 19:31 skrev Michal Hruška < [email protected]>: > Dear friends, > > > > Thank you all for your time and thoughts/ideas! > > The main goal for sharing our test results comparing XFS and GPFS was to > show, that the storage subsystem is able to do better if the I/O is > provided in different way. We were not trying to compare XFS and GPFS > directly, we understand that there will be some performance drop using GPFS > (compared to “raw” performance) but we are just surprised by the ~20-25% > performance drop. > > > > We tried to change multiple suggested parameters but we got no performance > gain. As there was no change we tried to do more troubleshooting using > different configurations. > > To better understand what we tried I have to describe our environment a > bit more: > > Our underlying storage system is IBM FS7300 (each controller has 384 GB > cache). There are 8 DRAIDs (8+2+1). Each DRAID has its own pool and each > pool has one Volume (LUN). Every FE server (we have 3 of them) is connected > directly to this storage using two 32 GFC connections. 3 client servers and > FE servers are connected to LAN switch using 100GbE connection. > > Testing results (metadata are located on NVMe SSD DRAID): > > 1. We used second - identical storage to test the performance but we > are getting almost the same results compared to first storage. In iohist we > can see that one LUN (dm-device) is probably overloaded as IO time is high > – from 300 to 500 ms. > 2. Using both storage systems together in one big FS (GPFS): always is > only one dm-device slow (according to iohist output) but the “problematic” > dm-device changes in time. > 3. During out tests we also tried synchronous fio test but we observed > significant performance drop. > 4. We tried to compare single LUN performance GPFS against XFS: GPFS > 435MB/s compared to XFS 485MB/s. From single server. The drop is not so > significant but when we added more LUNs to the comparison the performance > drop was more painful. > > For this testing “session” we were able to gather data by Storage Insights > to check storage performance: > > 1. There is no problematic HDD – the worst latency seen is 42ms from > all 176 drives in two storage systems. Average latency is 15ms. > 2. CPU usage was at 25% max. > 3. “Problematic” DRAID latency – average is 16ms the worst is 430ms. I > can not tell if there was the same peak in latency during XFS tests but I > think that no (or not so bad) – as the XFS is able to perform better than > GPFS. > 4. During our tests the write cache for all pools was fully allocated. > Both for XFS and GPFS tests. Which is expected state as the cache is much > faster than HDDs and it should help organize writes before they are > forwarded to RAID groups. > > > > Do you see some other possible problems we missed? > > I do not want to leave it behind “unfinished” but I am out of ideas. 😊 > > > > Best, > Michal > > *From:* Michal Hruška > *Sent:* Thursday, February 8, 2024 3:59 PM > *To:* '[email protected]' <[email protected]> > *Subject:* Re: [gpfsug-discuss] sequential I/O write - performance > > > > @Aaron > > Yes, I can confirm that 2MB blocks are transfered over. > > @ Jan-Frode > > We tried to change multiple parameters, but if you know the best > combination for sequential IO, please let me know. > > > > #mmlsconfig > > autoload no > > dmapiFileHandleSize 32 > > minReleaseLevel 5.1.9.0 > > tscCmdAllowRemoteConnections no > > ccrEnabled yes > > cipherList AUTHONLY > > sdrNotifyAuthEnabled yes > > pagepool 64G > > maxblocksize 16384K > > maxMBpS 40000 > > maxReceiverThreads 32 > > nsdMaxWorkerThreads 512 > > nsdMinWorkerThreads 8 > > nsdMultiQueue 256 > > nsdSmallThreadRatio 0 > > nsdThreadsPerQueue 3 > > prefetchAggressiveness 2 > > adminMode central > > > > /dev/fs0 > > @Uwe > > Using iohist we found out that gpfs is overloading one dm-device (it took > about 500ms to finish IOs). We replaced the „problematic“ dm-device (as we > have enough drives to play with) for new one but the overloading issue just > jumped to another dm-device. > We believe that this behaviour is caused by the gpfs but we are unable to > locate the root cause of it. > > > > Best, > Michal > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
