Hi List, on our MDS we noticed that all memory seems to be used. (And it's not just normal buffers/cache as far as I can tell.)
When we put load on the machine, for example by starting rsync on a few clients, generating file lists to copy data from Lustre to local disks or just running a MDT backup locally using dd/gzip to copy a LVM snapshot to a remote server, kswapd starts using a lot of CPU time, sometimes up to 100% of one CPU core. This is on a Lustre 1.6.7.2.ddn3.5 based file system with about 200TB, the MDT is 800GB with 200M inodes, ACLs enabled. The memory seems mostly used by the kernel and that quite a lot of it is ldlm_locks, ldlm_resource according to slabtop. Some details of this are below, but the main question that we now have is whether or not this is normal and expected. Is there a tunable to restrict Lustre to use a bit less slab memory than it currently is? Will adding more memory to this machine solve the problem that there seems to be not enough memory to run normal processes or will it just delay the occurrences of this? Kind regards, Frederik Memory details: <snip> [r...@cs04r-sc-mds01-01 proc]# free total used free shared buffers cached Mem: 16497436 16146416 351020 0 257624 17836 -/+ buffers/cache: 15870956 626480 Swap: 2031608 322768 1708840 [r...@cs04r-sc-mds01-01 proc]# cat /proc/meminfo MemTotal: 16497436 kB MemFree: 352004 kB Buffers: 256084 kB Cached: 17688 kB SwapCached: 149544 kB Active: 200764 kB Inactive: 255344 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 16497436 kB LowFree: 352004 kB SwapTotal: 2031608 kB SwapFree: 1708840 kB Dirty: 268 kB Writeback: 0 kB AnonPages: 182272 kB Mapped: 17528 kB Slab: 15248816 kB PageTables: 6984 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 10280324 kB Committed_AS: 1321284 kB VmallocTotal: 34359738367 kB VmallocUsed: 330740 kB VmallocChunk: 34359394255 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB [r...@cs04r-sc-mds01-01 proc]# slabtop --once |head -15 Active / Total Objects (% used) : 30350433 / 38705406 (78.4%) Active / Total Slabs (% used) : 3801362 / 3801369 (100.0%) Active / Total Caches (% used) : 114 / 168 (67.9%) Active / Total Size (% used) : 12325021.07K / 14610074.85K (84.4%) Minimum / Average / Maximum Object : 0.02K / 0.38K / 128.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 15657800 14362022 91% 0.50K 1957225 8 7828900K ldlm_locks 10165900 9719990 95% 0.38K 1016590 10 4066360K ldlm_resources 3650979 1038530 28% 0.06K 61881 59 247524K size-64 3646620 3159662 86% 0.12K 121554 30 486216K size-128 3099906 863841 27% 0.21K 172217 18 688868K dentry_cache 1679436 859267 51% 0.83K 419859 4 1679436K ldiskfs_inode_cache 460725 133164 28% 0.25K 30715 15 122860K size-256 122440 65022 53% 0.09K 3061 40 12244K buffer_head -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss