Hi Venkat, The AFM fileset becomes totally unresponsive from all nodes within the cluster and the only way to resolve it is to do a "mmshutdown" and wait 2 mins, then "mmshutdown" again as it cannot really do it the first time.. and then a "mmstartup" then all is back to normal and AFM is stopped and can be started again for another week or so..
mmafmctl <filesystem> stop -j <fileset> will just hang endless.. i will try to set that value and see if that does anything for us :) Thanks! Best Regards Andi Christiansen > On April 28, 2020 1:37 PM Venkateswara R Puvvada <vpuvv...@in.ibm.com> > wrote: > > > Hi, > > What is lock down of AFM fileset ? Are the messages in requeued state > and AFM won't replicate any data ? I would recommend opening a ticket by > collecting the logs and internaldump from the gateway node when the > replication is stuck. > > You can also try increasing the value of afmAsyncOpWaitTimeout option and > see if this solves the issue. > > mmchconfig afmAsyncOpWaitTimeout=3600 -i > > ~Venkat (vpuvv...@in.ibm.com) > > > > From: Andi Christiansen <a...@christiansen.xxx> > To: "gpfsug-discuss@spectrumscale.org" > <gpfsug-discuss@spectrumscale.org> > Date: 04/28/2020 12:04 PM > Subject: [EXTERNAL] [gpfsug-discuss] Tuning Spectrum Scale AFM > for stability? > Sent by: gpfsug-discuss-boun...@spectrumscale.org > > --------------------------------------------- > > > > Hi All, > > Can anyone share some thoughts on how to tune AFM for stability? at the > moment we have ok performance between our sites (5-8Gbits with 34ms latency) > but we encounter a lock down of the cache fileset from week to week, which > was day to day before we tuned below settings.. is there any way to tune AFM > further i haven't found ? > > > Cache Site only: > TCP Settings: > sunrpc.tcp_slot_table_entries = 128 > > > Home and Cache: > AFM / GPFS Settings: > maxBufferDescs=163840 > afmHardMemThreshold=25G > afmMaxWriteMergeLen=30G > > > Cache fileset: > Attributes for fileset AFMFILESET: > ================================ > Status Linked > Path /mnt/fs02/AFMFILESET > Id 1 > Root inode 524291 > Parent Id 0 > Created Tue Apr 14 15:57:43 2020 > Comment > Inode space 1 > Maximum number of inodes 10000384 > Allocated inodes 10000384 > Permission change flag chmodAndSetacl > afm-associated Yes > Target nfs://DK_VPN/mnt/fs01/AFMFILESET > Mode single-writer > File Lookup Refresh Interval 30 (default) > File Open Refresh Interval 30 (default) > Dir Lookup Refresh Interval 60 (default) > Dir Open Refresh Interval 60 (default) > Async Delay 15 (default) > Last pSnapId 0 > Display Home Snapshots no > Number of Read Threads per Gateway 64 > Parallel Read Chunk Size 128 > Parallel Read Threshold 1024 > Number of Gateway Flush Threads 48 > Prefetch Threshold 0 (default) > Eviction Enabled yes (default) > Parallel Write Threshold 1024 > Parallel Write Chunk Size 128 > Number of Write Threads per Gateway 16 > IO Flags 0 (default) > > > mmfsadm dump afm: > AFM Gateway: > RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 > readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize > 2147483648 > readBypassThresh 67108864 > QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 > Ping thread: Started > Fileset: AFMFILESET 1 (fs02) > mode: single-writer queue: Normal MDS: <c0n1> QMem 0 CTL 577 > home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 > handler: Mounted Dirty refCount: 1 > queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 > terminate: 0 psnapWait: 0 > remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 > avgTime 0,000000 maxTime 0.0 > queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs > 0 iwo 0 err 78 > handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 > InflightAsyncLookups 0 > lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : > 2020-04-27_15:09:57.415+0200 > i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 > pReadThreads 64 > i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: > 1073741824 > i/o: prefetchThresh 0 (Prefetch) > Mnt status: 0:0 1:0 2:0 3:0 > Export Map: 10.110.5.10/<c0n0> 10.110.5.11/<c0n1> 10.110.5.12/<c0n2> > 10.110.5.13/<c0n9> > Priority Queue: Empty (state: Active) > Normal Queue: Empty (state: Active) > > > Cluster Config Cache: > maxFilesToCache 131072 > maxStatCache 524288 > afmDIO 2 > afmIOFlags 4096 > maxReceiverThreads 32 > afmNumReadThreads 64 > afmNumWriteThreads 8 > afmHardMemThreshold 26843545600 > maxBufferDescs 163840 > afmMaxWriteMergeLen 32212254720 > workerThreads 1024 > > > The entries in the gpfs log states "AFM: Home is taking longer to > respond..." but its only AFM and the Cache AFM fileset which enteres a locked > state. we have the same NFS exports from home mounted on the same gateway > nodes to check when a file is transferred and they are all ok while the AFM > lock is happening. a simple gpfs restart of the AFM Master node is enough to > make AFM restart and continue for another week.. > > > The home target is exported through CES NFS from 4 CES nodes and a map is > created at the Cache site to utilize the ParallelWrites feature. > > > If there is anyone sitting around with some ideas/knowledge on how to > tune this further for more stability then i would be happy if you could share > your thoughts about it! :-) > > > Many Thanks in Advance! > Andi Christiansen > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss