Hi, What is lock down of AFM fileset ? Are the messages in requeued state and AFM won't replicate any data ? I would recommend opening a ticket by collecting the logs and internaldump from the gateway node when the replication is stuck.
You can also try increasing the value of afmAsyncOpWaitTimeout option and see if this solves the issue. mmchconfig afmAsyncOpWaitTimeout=3600 -i ~Venkat ([email protected]) From: Andi Christiansen <[email protected]> To: "[email protected]" <[email protected]> Date: 04/28/2020 12:04 PM Subject: [EXTERNAL] [gpfsug-discuss] Tuning Spectrum Scale AFM for stability? Sent by: [email protected] Hi All, Can anyone share some thoughts on how to tune AFM for stability? at the moment we have ok performance between our sites (5-8Gbits with 34ms latency) but we encounter a lock down of the cache fileset from week to week, which was day to day before we tuned below settings.. is there any way to tune AFM further i haven't found ? Cache Site only: TCP Settings: sunrpc.tcp_slot_table_entries = 128 Home and Cache: AFM / GPFS Settings: maxBufferDescs=163840 afmHardMemThreshold=25G afmMaxWriteMergeLen=30G Cache fileset: Attributes for fileset AFMFILESET: ================================ Status Linked Path /mnt/fs02/AFMFILESET Id 1 Root inode 524291 Parent Id 0 Created Tue Apr 14 15:57:43 2020 Comment Inode space 1 Maximum number of inodes 10000384 Allocated inodes 10000384 Permission change flag chmodAndSetacl afm-associated Yes Target nfs://DK_VPN/mnt/fs01/AFMFILESET Mode single-writer File Lookup Refresh Interval 30 (default) File Open Refresh Interval 30 (default) Dir Lookup Refresh Interval 60 (default) Dir Open Refresh Interval 60 (default) Async Delay 15 (default) Last pSnapId 0 Display Home Snapshots no Number of Read Threads per Gateway 64 Parallel Read Chunk Size 128 Parallel Read Threshold 1024 Number of Gateway Flush Threads 48 Prefetch Threshold 0 (default) Eviction Enabled yes (default) Parallel Write Threshold 1024 Parallel Write Chunk Size 128 Number of Write Threads per Gateway 16 IO Flags 0 (default) mmfsadm dump afm: AFM Gateway: RpcQLen: 0 maxPoolSize: 4294967295 QOF: 0 MaxOF: 131072 readThLimit 128 minIOBuf 1048576 maxIOBuf 1073741824 msgMaxWriteSize 2147483648 readBypassThresh 67108864 QLen: 0 QMem: 0 SoftQMem: 10737418240 HardQMem 26843545600 Ping thread: Started Fileset: AFMFILESET 1 (fs02) mode: single-writer queue: Normal MDS: <c0n1> QMem 0 CTL 577 home: DK_VPN homeServer: 10.110.5.11 proto: nfs port: 2049 lastCmd: 16 handler: Mounted Dirty refCount: 1 queueTransfer: state: Idle senderVerified: 0 receiverVerified: 1 terminate: 0 psnapWait: 0 remoteAttrs: AsyncLookups 0 tsfindinode: success 0 failed 0 totalTime 0.0 avgTime 0,000000 maxTime 0.0 queue: delay 15 QLen 0+0 flushThds 0 maxFlushThds 48 numExec 8772518 qfs 0 iwo 0 err 78 handlerCreateTime : 2020-04-27_11:14:57.415+0200 numCreateSnaps : 0 InflightAsyncLookups 0 lastReplayTime : 2020-04-28_07:22:32.415+0200 lastSyncTime : 2020-04-27_15:09:57.415+0200 i/o: readBuf: 33554432 writeBuf: 2097152 sparseReadThresh: 134217728 pReadThreads 64 i/o: pReadChunkSize 33554432 pReadThresh: 1073741824 pWriteThresh: 1073741824 i/o: prefetchThresh 0 (Prefetch) Mnt status: 0:0 1:0 2:0 3:0 Export Map: 10.110.5.10/<c0n0> 10.110.5.11/<c0n1> 10.110.5.12/<c0n2> 10.110.5.13/<c0n9> Priority Queue: Empty (state: Active) Normal Queue: Empty (state: Active) Cluster Config Cache: maxFilesToCache 131072 maxStatCache 524288 afmDIO 2 afmIOFlags 4096 maxReceiverThreads 32 afmNumReadThreads 64 afmNumWriteThreads 8 afmHardMemThreshold 26843545600 maxBufferDescs 163840 afmMaxWriteMergeLen 32212254720 workerThreads 1024 The entries in the gpfs log states "AFM: Home is taking longer to respond..." but its only AFM and the Cache AFM fileset which enteres a locked state. we have the same NFS exports from home mounted on the same gateway nodes to check when a file is transferred and they are all ok while the AFM lock is happening. a simple gpfs restart of the AFM Master node is enough to make AFM restart and continue for another week.. The home target is exported through CES NFS from 4 CES nodes and a map is created at the Cache site to utilize the ParallelWrites feature. If there is anyone sitting around with some ideas/knowledge on how to tune this further for more stability then i would be happy if you could share your thoughts about it! :-) Many Thanks in Advance! Andi Christiansen _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=-XbtU1ILcqI_bUurDD3j1j-oqGszcNZAbQVIhQ5EZOs&s=IjrGy-VdY1cuNfy0bViEykWMEVDax7_xvrMdRhQ2QkM&e=
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
