you basically run GSS 1.0 code , while in the current version is GSS 2.0 (which replaced Version 1.5 2 month ago)
GSS 1.5 and 2.0 have several enhancements in this space so i strongly encourage you to upgrade your systems. if you can specify a bit what your workload is there might also be additional knobs we can turn to change the behavior. ------------------------------------------ Sven Oehme Scalable Storage Research email: [email protected] Phone: +1 (408) 824-8904 IBM Almaden Research Lab ------------------------------------------ [email protected] wrote on 10/14/2014 09:39:18 AM: > From: Salvatore Di Nardo <[email protected]> > To: gpfsug main discussion list <[email protected]> > Date: 10/14/2014 09:40 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: [email protected] > > Thanks in advance for your help. > > We have 6 RG: > recovery group vdisks vdisks servers > ------------------ ----------- ------ ------- > gss01a 4 8 gss01a.ebi.ac.uk,gss01b.ebi.ac.uk > gss01b 4 8 gss01b.ebi.ac.uk,gss01a.ebi.ac.uk > gss02a 4 8 gss02a.ebi.ac.uk,gss02b.ebi.ac.uk > gss02b 4 8 gss02b.ebi.ac.uk,gss02a.ebi.ac.uk > gss03a 4 8 gss03a.ebi.ac.uk,gss03b.ebi.ac.uk > gss03b 4 8 gss03b.ebi.ac.uk,gss03a.ebi.ac.uk > > Check the attached file for RG details. > Following mmlsconfig: > [root@gss01a ~]# mmlsconfig > Configuration data for cluster GSS.ebi.ac.uk: > --------------------------------------------- > myNodeConfigNumber 1 > clusterName GSS.ebi.ac.uk > clusterId 17987981184946329605 > autoload no > dmapiFileHandleSize 32 > minReleaseLevel 3.5.0.11 > [gss01a,gss01b,gss02a,gss02b,gss03a,gss03b] > pagepool 38g > nsdRAIDBufferPoolSizePct 80 > maxBufferDescs 2m > numaMemoryInterleave yes > prefetchPct 5 > maxblocksize 16m > nsdRAIDTracks 128k > ioHistorySize 64k > nsdRAIDSmallBufferSize 256k > nsdMaxWorkerThreads 3k > nsdMinWorkerThreads 3k > nsdRAIDSmallThreadRatio 2 > nsdRAIDThreadsPerQueue 16 > nsdClientCksumTypeLocal ck64 > nsdClientCksumTypeRemote ck64 > nsdRAIDEventLogToConsole all > nsdRAIDFastWriteFSDataLimit 64k > nsdRAIDFastWriteFSMetadataLimit 256k > nsdRAIDReconstructAggressiveness 1 > nsdRAIDFlusherBuffersLowWatermarkPct 20 > nsdRAIDFlusherBuffersLimitPct 80 > nsdRAIDFlusherTracksLowWatermarkPct 20 > nsdRAIDFlusherTracksLimitPct 80 > nsdRAIDFlusherFWLogHighWatermarkMB 1000 > nsdRAIDFlusherFWLogLimitMB 5000 > nsdRAIDFlusherThreadsLowWatermark 1 > nsdRAIDFlusherThreadsHighWatermark 512 > nsdRAIDBlockDeviceMaxSectorsKB 4096 > nsdRAIDBlockDeviceNrRequests 32 > nsdRAIDBlockDeviceQueueDepth 16 > nsdRAIDBlockDeviceScheduler deadline > nsdRAIDMaxTransientStale2FT 1 > nsdRAIDMaxTransientStale3FT 1 > syncWorkerThreads 256 > tscWorkerPool 64 > nsdInlineWriteMax 32k > maxFilesToCache 12k > maxStatCache 512 > maxGeneralThreads 1280 > flushedDataTarget 1024 > flushedInodeTarget 1024 > maxFileCleaners 1024 > maxBufferCleaners 1024 > logBufferCount 20 > logWrapAmountPct 2 > logWrapThreads 128 > maxAllocRegionsPerNode 32 > maxBackgroundDeletionThreads 16 > maxInodeDeallocPrefetch 128 > maxMBpS 16000 > maxReceiverThreads 128 > worker1Threads 1024 > worker3Threads 32 > [common] > cipherList AUTHONLY > socketMaxListenConnections 1500 > failureDetectionTime 60 > [common] > adminMode central > > File systems in cluster GSS.ebi.ac.uk: > -------------------------------------- > /dev/gpfs1 > For more configuration paramenters i also attached a file with the > complete output of mmdiag --config. > > > and mmlsfs: > > File system attributes for /dev/gpfs1: > ====================================== > flag value description > ------------------- ------------------------ > ----------------------------------- > -f 32768 Minimum fragment size > in bytes (system pool) > 262144 Minimum fragment size > in bytes (other pools) > -i 512 Inode size in bytes > -I 32768 Indirect block size in bytes > -m 2 Default number of > metadata replicas > -M 2 Maximum number of > metadata replicas > -r 1 Default number of data replicas > -R 2 Maximum number of data replicas > -j scatter Block allocation type > -D nfs4 File locking semantics in effect > -k all ACL semantics in effect > -n 1000 Estimated number of > nodes that will mount file system > -B 1048576 Block size (system pool) > 8388608 Block size (other pools) > -Q user;group;fileset Quotas enforced > user;group;fileset Default quotas enabled > --filesetdf no Fileset df enabled? > -V 13.23 (3.5.0.7) File system version > --create-time Tue Mar 18 16:01:24 2014 File system creation time > -u yes Support for large LUNs? > -z no Is DMAPI enabled? > -L 4194304 Logfile size > -E yes Exact mtime mount option > -S yes Suppress atime mount option > -K whenpossible Strict replica allocation option > --fastea yes Fast external attributes enabled? > --inode-limit 134217728 Maximum number of inodes > -P system;data Disk storage pools in file system > -d > gss01a_MetaData_8M_3p_1;gss01a_MetaData_8M_3p_2;gss01a_MetaData_8M_3p_3;gss01b_MetaData_8M_3p_1;gss01b_MetaData_8M_3p_2;gss01b_MetaData_8M_3p_3;gss02a_MetaData_8M_3p_1; > -d > gss02a_MetaData_8M_3p_2;gss02a_MetaData_8M_3p_3;gss02b_MetaData_8M_3p_1;gss02b_MetaData_8M_3p_2;gss02b_MetaData_8M_3p_3;gss03a_MetaData_8M_3p_1;gss03a_MetaData_8M_3p_2; > -d > gss03a_MetaData_8M_3p_3;gss03b_MetaData_8M_3p_1;gss03b_MetaData_8M_3p_2;gss03b_MetaData_8M_3p_3;gss01a_Data_8M_3p_1;gss01a_Data_8M_3p_2;gss01a_Data_8M_3p_3;gss01b_Data_8M_3p_1; > -d > gss01b_Data_8M_3p_2;gss01b_Data_8M_3p_3;gss02a_Data_8M_3p_1;gss02a_Data_8M_3p_2;gss02a_Data_8M_3p_3;gss02b_Data_8M_3p_1;gss02b_Data_8M_3p_2;gss02b_Data_8M_3p_3;gss03a_Data_8M_3p_1; > -d > gss03a_Data_8M_3p_2;gss03a_Data_8M_3p_3;gss03b_Data_8M_3p_1;gss03b_Data_8M_3p_2;gss03b_Data_8M_3p_3 > Disks in file system > --perfileset-quota no Per-fileset quota enforcement > -A yes Automatic mount option > -o none Additional mount options > -T /gpfs1 Default mount point > --mount-priority 0 Mount priority > > > Regards, > Salvatore > > On 14/10/14 17:22, Sven Oehme wrote: > your GSS code version is very backlevel. > > can you please send me the output of mmlsrecoverygroup RGNAME -L --pdisk > as well as mmlsconfig and mmlsfs all > > thx. Sven > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: [email protected] > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo <[email protected]> > To: [email protected] > Date: 10/14/2014 08:23 AM > Subject: Re: [gpfsug-discuss] wait for permission to append to log > Sent by: [email protected] > > > > > On 14/10/14 15:51, Sven Oehme wrote: > it means there is contention on inserting data into the fast write > log on the GSS Node, which could be config or workload related > what GSS code version are you running > [root@ebi5-251 ~]# mmdiag --version > > === mmdiag: version === > Current GPFS build: "3.5.0-11 efix1 (888041)". > Built on Jul 9 2013 at 18:03:32 > Running 6 days 2 hours 10 minutes 35 secs > > > > and how are the nodes connected with each other (Ethernet or IB) ? > ethernet. they use the same bonding (4x10Gb/s) where the data is > passing. We don't have admin dedicated network > > [root@gss03a ~]# mmlscluster > > GPFS cluster information > ======================== > GPFS cluster name: GSS.ebi.ac.uk > GPFS cluster id: 17987981184946329605 > GPFS UID domain: GSS.ebi.ac.uk > Remote shell command: /usr/bin/ssh > Remote file copy command: /usr/bin/scp > > GPFS cluster configuration servers: > ----------------------------------- > Primary server: gss01a.ebi.ac.uk > Secondary server: gss02b.ebi.ac.uk > > Node Daemon node name IP address Admin node name Designation > ----------------------------------------------------------------------- > 1 gss01a.ebi.ac.uk 10.7.28.2 gss01a.ebi.ac.uk quorum-manager > 2 gss01b.ebi.ac.uk 10.7.28.3 gss01b.ebi.ac.uk quorum-manager > 3 gss02a.ebi.ac.uk 10.7.28.67 gss02a.ebi.ac.uk quorum-manager > 4 gss02b.ebi.ac.uk 10.7.28.66 gss02b.ebi.ac.uk quorum-manager > 5 gss03a.ebi.ac.uk 10.7.28.34 gss03a.ebi.ac.uk quorum-manager > 6 gss03b.ebi.ac.uk 10.7.28.35 gss03b.ebi.ac.uk quorum-manager > > > Note: The 3 node "pairs" (gss01, gss02 and gss03) are in different > subnet because of datacenter constraints ( They are not physically > in the same row, and due to network constraints was not possible to > put them in the same subnet). The packets are routed, but should not > be a problem as there is 160Gb/s bandwidth between them. > > Regards, > Salvatore > > > > ------------------------------------------ > Sven Oehme > Scalable Storage Research > email: [email protected] > Phone: +1 (408) 824-8904 > IBM Almaden Research Lab > ------------------------------------------ > > > > From: Salvatore Di Nardo <[email protected]> > To: gpfsug main discussion list <[email protected]> > Date: 10/14/2014 07:40 AM > Subject: [gpfsug-discuss] wait for permission to append to log > Sent by: [email protected] > > > > hello all, > could someone explain me the meaning of those waiters? > > gss02b.ebi.ac.uk: 0x7F21EA8541B0 waiting 0.122786709 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA5F4EC0 waiting 0.122770807 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA9BD1A0 waiting 0.122115115 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA32FF30 waiting 0.121371877 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6A1BA0 waiting 0.119322600 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2E4330 waiting 0.118216774 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA72E930 waiting 0.117961594 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6539C0 waiting 0.116133122 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3D3490 waiting 0.116103642 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA85A060 waiting 0.115137978 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4C84A0 waiting 0.115046631 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA229310 waiting 0.114498225 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2AB630 waiting 0.113035120 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA83D9E0 waiting 0.112934666 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA736DC0 waiting 0.112834203 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3A2C20 waiting 0.111498004 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3B2250 waiting 0.111309423 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAABDF10 waiting 0.110939219 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA27A00 waiting 0.110025022 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA8D6A0 waiting 0.109176110 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2B3AC0 waiting 0.109025355 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2080D0 waiting 0.108702893 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA3AC3A0 waiting 0.107691494 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB460E0 waiting 0.106003854 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2093C0 waiting 0.105781682 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA6FBAE0 waiting 0.105696084 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA573E90 waiting 0.105182795 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA4191E0 waiting 0.104335963 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA82AAE0 waiting 0.104079258 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA538BB0 waiting 0.103798658 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA15DF0 waiting 0.102778144 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA57C320 waiting 0.100503136 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA802700 waiting 0.100499392 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAA5F410 waiting 0.100489143 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA861200 waiting 0.100351636 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA54BAB0 waiting 0.099615942 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAAAFBD0 waiting 0.099477387 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA657290 waiting 0.099123599 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2BD240 waiting 0.099074074 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA205AF0 waiting 0.097532291 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA477CE0 waiting 0.097311417 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA2F9810 waiting 0.096209425 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA463AF0 waiting 0.096143868 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8B2CB0 waiting 0.094143517 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA7D1E90 waiting 0.093156759 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB473D0 waiting 0.093154775 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EAB03C60 waiting 0.092952495 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > gss02b.ebi.ac.uk: 0x7F21EA8766E0 waiting 0.092908405 seconds, > NSDThread: on ThCond 0x7F2114005750 (0x7F2114005750) > (VdiskLogAppendCondvar), reason 'wait for permission to append to log' > > Does it means that the vdisk logs are struggling? > > Regards, > Salvatore > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > [attachment "mmlsrecoverygroup.txt" deleted by Sven Oehme/Almaden/ > IBM] [attachment "mmdiag-config.txt" deleted by Sven Oehme/Almaden/IBM] > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at gpfsug.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
