Hello, We have a 3-way replicated Gluster setup where clients are connected through NFS and the clients are also the server. Here we see the Gluster NFS server keeps increasing the RAM usage until eventually the server goes out of memory. We have this on all 3 servers. The server has 96GB RAM total and we've seen the Gluster NFS server use op to 70GB RAM and all the swap was 100% in use. If other processes wouldn't also use the RAM I guess Gluster would claim that as well.
We are running GlusterFS 3.12.9-1 on Debian 8. The process causing the high memory is: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/run/gluster/nfs/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/94e073c0dae2c47025351342ba0ddc44.socket Gluster volume info: Volume Name: www Type: Replicate Volume ID: fbcc21ee-bd0b-40a5-8785-bd00e49e9b72 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.0.0.3:/storage/sdc1/www Brick2: 10.0.0.2:/storage/sdc1/www Brick3: 10.0.0.1:/storage/sdc1/www Options Reconfigured: diagnostics.client-log-level: ERROR performance.stat-prefetch: on performance.md-cache-timeout: 600 performance.cache-invalidation: on features.cache-invalidation: on network.ping-timeout: 3 transport.address-family: inet performance.readdir-ahead: on nfs.disable: off performance.cache-size: 1GB performance.write-behind-window-size: 4MB performance.nfs.io-threads: on performance.nfs.io-cache: off performance.nfs.quick-read: off performance.nfs.write-behind-window-size: 4MB features.cache-invalidation-timeout: 600 performance.nfs.stat-prefetch: on network.inode-lru-limit: 90000 performance.cache-priority: *.php:3,*.temp:3,*:1 cluster.readdir-optimize: on performance.nfs.read-ahead: off performance.flush-behind: on performance.write-behind: on performance.nfs.write-behind: on performance.nfs.flush-behind: on features.bitrot: on features.scrub: Active performance.quick-read: off performance.io-thread-count: 64 nfs.enable-ino32: on nfs.log-level: ERROR storage.build-pgfid: off diagnostics.brick-log-level: WARNING cluster.self-heal-daemon: enable We don't see anyting in the logs that looks like it could explain the high memory. We did make a statedump which I'll post here and which I have also attached as attachment: https://pastebin.com/raw/sDNF1wwi Running the command to get the statedump is quite dangerous for us as the USR1 signal appeared to cause Gluster to move swap memory back into RAM and go offline while this is in progress. Fwiw we do have vm.swappiness set to 1 Does anyone have an idea of what could cause this and what we can do to stop such high memory usage? Cheers, Niels
glusterdump.2076.dump.1527500065
Description: Binary data
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
