Switch of CCR and see what happens. On Mon, 24 Jul 2017, 15:40 Adam Huffman, <[email protected]> wrote:
> smem is recommended here > > Cheers, > Adam > > -- > > Adam Huffman > Senior HPC and Cloud Systems Engineer > The Francis Crick Institute > 1 Midland Road > London NW1 1AT > > T: 020 3796 1175 > E: [email protected] > W: www.crick.ac.uk > > > > > > On 24 Jul 2017, at 15:21, Peter Childs <[email protected]> wrote: > > > top > > but ps gives the same value. > > [root@dn29 ~]# ps auww -q 4444 > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 4444 2.7 22.3 10537600 5472580 ? S<Ll Jul12 466:13 > /usr/lpp/mmfs/bin/mmfsd > > Thanks for the help > > Peter. > > > On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote: > > How are you identifying the high memory usage? > > > On Monday, July 24, 2017 9:30 AM, Peter Childs <[email protected]> > wrote: > > > I've had a look at mmfsadm dump malloc and it looks to agree with the > output from mmdiag --memory. and does not seam to account for the excessive > memory usage. > > The new machines do have idleSocketTimout set to 0 from what your saying > it could be related to keeping that many connections between nodes working. > > Thanks in advance > > Peter. > > > > > [root@dn29 ~]# mmdiag --memory > > === mmdiag: memory === > mmfsd heap size: 2039808 bytes > > > Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)") > 128 bytes in use > 17500049370 hard limit on memory usage > 1048576 bytes committed to regions > 1 number of regions > 555 allocations > 555 frees > 0 allocation failures > > > Statistics for MemoryPool id 2 ("Shared Segment") > 42179592 bytes in use > 17500049370 hard limit on memory usage > 56623104 bytes committed to regions > 9 number of regions > 100027 allocations > 79624 frees > 0 allocation failures > > > Statistics for MemoryPool id 3 ("Token Manager") > 2099520 bytes in use > 17500049370 hard limit on memory usage > 16778240 bytes committed to regions > 1 number of regions > 4 allocations > 0 frees > 0 allocation failures > > > On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote: > > There are 3 places that the GPFS mmfsd uses memory the pagepool plus 2 > shared memory segments. To see the memory utilization of the shared > memory segments run the command mmfsadm dump malloc . The statistics > for memory pool id 2 is where maxFilesToCache/maxStatCache objects are > and the manager nodes use memory pool id 3 to track the MFTC/MSC objects. > > You might want to upgrade to later PTF as there was a PTF to fix a memory > leak that occurred in tscomm associated with network connection drops. > > > On Monday, July 24, 2017 5:29 AM, Peter Childs <[email protected]> > wrote: > > > We have two GPFS clusters. > > One is fairly old and running 4.2.1-2 and non CCR and the nodes run > fine using up about 1.5G of memory and is consistent (GPFS pagepool is > set to 1G, so that looks about right.) > > The other one is "newer" running 4.2.1-3 with CCR and the nodes keep > increasing in there memory usage, starting at about 1.1G and are find > for a few days however after a while they grow to 4.2G which when the > node need to run real work, means the work can't be done. > > I'm losing track of what maybe different other than CCR, and I'm trying > to find some more ideas of where to look. > > I'm checked all the standard things like pagepool and maxFilesToCache > (set to the default of 4000), workerThreads is set to 128 on the new > gpfs cluster (against default 48 on the old) > > I'm not sure what else to look at on this one hence why I'm asking the > community. > > Thanks in advance > > Peter Childs > ITS Research Storage > Queen Mary University of London. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at > spectrumscale.orghttp://gpfsug.org/mailman/listinfo/gpfsug-discuss > > -- > > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > The Francis Crick Institute Limited is a registered charity in England and > Wales no. 1140062 and a company registered in England and Wales no. > 06885462, with its registered office at 1 Midland Road London NW1 1AT > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
