Re: [Gluster-users] Run away memory with gluster mount

Nithya Balachandran Wed, 21 Feb 2018 19:39:07 -0800

On 21 February 2018 at 21:11, Dan Ragle <[email protected]> wrote:


>
>
> On 2/3/2018 8:58 AM, Dan Ragle wrote:
>
>>
>>
>> On 2/2/2018 2:13 AM, Nithya Balachandran wrote:
>>
>>> Hi Dan,
>>>
>>> It sounds like you might be running into [1]. The patch has been posted
>>> upstream and the fix should be in the next release.
>>> In the meantime, I'm afraid there is no way to get around this without
>>> restarting the process.
>>>
>>> Regards,
>>> Nithya
>>>
>>> [1]https://bugzilla.redhat.com/show_bug.cgi?id=1541264
>>>
>>>
>> Much appreciated. Will watch for the next release and retest then.
>>
>> Cheers!
>>
>> Dan
>>
>>
> FYI, this looks like it's fixed in 3.12.6. Ran the test setup with
> repeated ls listings for just shy of 48 hours with no increase in RAM
> usage. Next will try my production application load for awhile to see if it
> holds steady.
>
> The gf_dht_mt_dht_layout_t memusage num_allocs went quickly up to 105415
> and then stayed there for the entire 48 hours.
>
>
Excellent. Thanks for letting us know.

Nithya


> Thanks for the quick response,
>
> Dan
>
>
>>> On 2 February 2018 at 02:57, Dan Ragle <[email protected] <mailto:
>>> [email protected]>> wrote:
>>>
>>>
>>>
>>>     On 1/30/2018 6:31 AM, Raghavendra Gowdappa wrote:
>>>
>>>
>>>
>>>         ----- Original Message -----
>>>
>>>             From: "Dan Ragle" <[email protected]>
>>>             To: "Raghavendra Gowdappa" <[email protected]
>>>             <mailto:[email protected]>>, "Ravishankar N"
>>>             <[email protected] <mailto:[email protected]>>
>>>             Cc: [email protected]
>>>             <mailto:[email protected]>, "Csaba Henk"
>>>             <[email protected] <mailto:[email protected]>>, "Niels de Vos"
>>>             <[email protected] <mailto:[email protected]>>, "Nithya
>>>             Balachandran" <[email protected] <mailto:
>>> [email protected]>>
>>>             Sent: Monday, January 29, 2018 9:02:21 PM
>>>             Subject: Re: [Gluster-users] Run away memory with gluster
>>> mount
>>>
>>>
>>>
>>>             On 1/29/2018 2:36 AM, Raghavendra Gowdappa wrote:
>>>
>>>
>>>
>>>                 ----- Original Message -----
>>>
>>>                     From: "Ravishankar N" <[email protected]
>>>                     <mailto:[email protected]>>
>>>                     To: "Dan Ragle" <[email protected]>,
>>>                     [email protected]
>>>                     <mailto:[email protected]>
>>>                     Cc: "Csaba Henk" <[email protected]
>>>                     <mailto:[email protected]>>, "Niels de Vos"
>>>                     <[email protected] <mailto:[email protected]>>,
>>>                     "Nithya Balachandran" <[email protected]
>>>                     <mailto:[email protected]>>,
>>>                     "Raghavendra Gowdappa" <[email protected]
>>>                     <mailto:[email protected]>>
>>>                     Sent: Saturday, January 27, 2018 10:23:38 AM
>>>                     Subject: Re: [Gluster-users] Run away memory with
>>>                     gluster mount
>>>
>>>
>>>
>>>                     On 01/27/2018 02:29 AM, Dan Ragle wrote:
>>>
>>>
>>>                         On 1/25/2018 8:21 PM, Ravishankar N wrote:
>>>
>>>
>>>
>>>                             On 01/25/2018 11:04 PM, Dan Ragle wrote:
>>>
>>>                                 *sigh* trying again to correct
>>>                                 formatting ... apologize for the
>>>                                 earlier mess.
>>>
>>>                                 Having a memory issue with Gluster
>>>                                 3.12.4 and not sure how to
>>>                                 troubleshoot. I don't *think* this is
>>>                                 expected behavior.
>>>
>>>                                 This is on an updated CentOS 7 box. The
>>>                                 setup is a simple two node
>>>                                 replicated layout where the two nodes
>>>                                 act as both server and
>>>                                 client.
>>>
>>>                                 The volume in question:
>>>
>>>                                 Volume Name: GlusterWWW
>>>                                 Type: Replicate
>>>                                 Volume ID:
>>>                                 8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
>>>                                 Status: Started
>>>                                 Snapshot Count: 0
>>>                                 Number of Bricks: 1 x 2 = 2
>>>                                 Transport-type: tcp
>>>                                 Bricks:
>>>                                 Brick1:
>>>                                 vs1dlan.mydomain.com:/glusterf
>>> s_bricks/brick1/www
>>>                                 Brick2:
>>>                                 vs2dlan.mydomain.com:/glusterf
>>> s_bricks/brick1/www
>>>                                 Options Reconfigured:
>>>                                 nfs.disable: on
>>>                                 cluster.favorite-child-policy: mtime
>>>                                 transport.address-family: inet
>>>
>>>                                 I had some other performance options in
>>>                                 there, (increased
>>>                                 cache-size, md invalidation, etc) but
>>>                                 stripped them out in an
>>>                                 attempt to
>>>                                 isolate the issue. Still got the problem
>>>                                 without them.
>>>
>>>                                 The volume currently contains over 1M
>>> files.
>>>
>>>                                 When mounting the volume, I get (among
>>>                                 other things) a process as such:
>>>
>>>                                 /usr/sbin/glusterfs
>>>                                 --volfile-server=localhost
>>>                                 --volfile-id=/GlusterWWW /var/www
>>>
>>>                                 This process begins with little memory,
>>>                                 but then as files are
>>>                                 accessed in the volume the memory
>>>                                 increases. I setup a script that
>>>                                 simply reads the files in the volume one
>>>                                 at a time (no writes). It's
>>>                                 been running on and off about 12 hours
>>>                                 now and the resident
>>>                                 memory of the above process is already
>>>                                 at 7.5G and continues to grow
>>>                                 slowly. If I stop the test script the
>>>                                 memory stops growing,
>>>                                 but does not reduce. Restart the test
>>>                                 script and the memory begins
>>>                                 slowly growing again.
>>>
>>>                                 This is obviously a contrived app
>>>                                 environment. With my intended
>>>                                 application load it takes about a week
>>>                                 or so for the memory to get
>>>                                 high enough to invoke the oom killer.
>>>
>>>
>>>                             Can you try debugging with the statedump
>>>                             (https://gluster.readthedocs.i
>>> o/en/latest/Troubleshooting/statedump/#read-a-statedump
>>>                             <https://gluster.readthedocs.i
>>> o/en/latest/Troubleshooting/statedump/#read-a-statedump>)
>>>                             of
>>>                             the fuse mount process and see what member
>>>                             is leaking? Take the
>>>                             statedumps in succession, maybe once
>>>                             initially during the I/O and
>>>                             once the memory gets high enough to hit the
>>>                             OOM mark.
>>>                             Share the dumps here.
>>>
>>>                             Regards,
>>>                             Ravi
>>>
>>>
>>>                         Thanks for the reply. I noticed yesterday that
>>>                         an update (3.12.5) had
>>>                         been posted so I went ahead and updated and
>>>                         repeated the test
>>>                         overnight. The memory usage does not appear to
>>>                         be growing as quickly
>>>                         as is was with 3.12.4, but does still appear to
>>>                         be growing.
>>>
>>>                         I should also mention that there is another
>>>                         process beyond my test app
>>>                         that is reading the files from the volume.
>>>                         Specifically, there is an
>>>                         rsync that runs from the second node 2-4 times
>>>                         an hour that reads from
>>>                         the GlusterWWW volume mounted on node 1. Since
>>>                         none of the files in
>>>                         that mount are changing it doesn't actually
>>>                         rsync anything, but
>>>                         nonetheless it is running and reading the files
>>>                         in addition to my test
>>>                         script. (It's a part of my intended production
>>>                         setup that I forgot was
>>>                         still running.)
>>>
>>>                         The mount process appears to be gaining memory
>>>                         at a rate of about 1GB
>>>                         every 4 hours or so. At that rate it'll take
>>>                         several days before it
>>>                         runs the box out of memory. But I took your
>>>                         suggestion and made some
>>>                         statedumps today anyway, about 2 hours apart, 4
>>>                         total so far. It looks
>>>                         like there may already be some actionable
>>>                         information. These are the
>>>                         only registers where the num_allocs have grown
>>>                         with each of the four
>>>                         samples:
>>>
>>>                         [mount/fuse.fuse - usage-type gf_fuse_mt_gids_t
>>>                         memusage]
>>>                             ---> num_allocs at Fri Jan 26 08:57:31 2018:
>>> 784
>>>                             ---> num_allocs at Fri Jan 26 10:55:50 2018:
>>> 831
>>>                             ---> num_allocs at Fri Jan 26 12:55:15 2018:
>>> 877
>>>                             ---> num_allocs at Fri Jan 26 14:58:27 2018:
>>> 908
>>>
>>>                         [mount/fuse.fuse - usage-type
>>>                         gf_common_mt_fd_lk_ctx_t memusage]
>>>                             ---> num_allocs at Fri Jan 26 08:57:31 2018:
>>> 5
>>>                             ---> num_allocs at Fri Jan 26 10:55:50 2018:
>>> 10
>>>                             ---> num_allocs at Fri Jan 26 12:55:15 2018:
>>> 15
>>>                             ---> num_allocs at Fri Jan 26 14:58:27 2018:
>>> 17
>>>
>>>                         [cluster/distribute.GlusterWWW-dht - usage-type
>>>                         gf_dht_mt_dht_layout_t
>>>                         memusage]
>>>                             ---> num_allocs at Fri Jan 26 08:57:31 2018:
>>>                         24243596
>>>                             ---> num_allocs at Fri Jan 26 10:55:50 2018:
>>>                         27902622
>>>                             ---> num_allocs at Fri Jan 26 12:55:15 2018:
>>>                         30678066
>>>                             ---> num_allocs at Fri Jan 26 14:58:27 2018:
>>>                         33801036
>>>
>>>                         Not sure the best way to get you the full dumps.
>>>                         They're pretty big,
>>>                         over 1G for all four. Also, I noticed some
>>>                         filepath information in
>>>                         there that I'd rather not share. What's the
>>>                         recommended next step?
>>>
>>>
>>>                 Please run the following query on statedump files and
>>>                 report us the
>>>                 results:
>>>                 # grep itable <client-statedump> | grep active | wc -l
>>>                 # grep itable <client-statedump> | grep active_size
>>>                 # grep itable <client-statedump> | grep lru | wc -l
>>>                 # grep itable <client-statedump> | grep lru_size
>>>                 # grep itable <client-statedump> | grep purge | wc -l
>>>                 # grep itable <client-statedump> | grep purge_size
>>>
>>>
>>>             Had to restart the test and have been running for 36 hours
>>>             now. RSS is
>>>             currently up to 23g.
>>>
>>>             Working on getting a bug report with link to the dumps. In
>>>             the mean
>>>             time, I'm including the results of your above queries for
>>>             the first
>>>             dump, the 18 hour dump, and the 36 hour dump:
>>>
>>>             # grep itable glusterdump.153904.dump.1517104561 | grep
>>>             active | wc -l
>>>             53865
>>>             # grep itable glusterdump.153904.dump.1517169361 | grep
>>>             active | wc -l
>>>             53864
>>>             # grep itable glusterdump.153904.dump.1517234161 | grep
>>>             active | wc -l
>>>             53864
>>>
>>>             # grep itable glusterdump.153904.dump.1517104561 | grep
>>>             active_size
>>>             xlator.mount.fuse.itable.active_size=53864
>>>             # grep itable glusterdump.153904.dump.1517169361 | grep
>>>             active_size
>>>             xlator.mount.fuse.itable.active_size=53863
>>>             # grep itable glusterdump.153904.dump.1517234161 | grep
>>>             active_size
>>>             xlator.mount.fuse.itable.active_size=53863
>>>
>>>             # grep itable glusterdump.153904.dump.1517104561 | grep lru
>>>             | wc -l
>>>             998510
>>>             # grep itable glusterdump.153904.dump.1517169361 | grep lru
>>>             | wc -l
>>>             998510
>>>             # grep itable glusterdump.153904.dump.1517234161 | grep lru
>>>             | wc -l
>>>             995992
>>>
>>>             # grep itable glusterdump.153904.dump.1517104561 | grep
>>> lru_size
>>>             xlator.mount.fuse.itable.lru_size=998508
>>>             # grep itable glusterdump.153904.dump.1517169361 | grep
>>> lru_size
>>>             xlator.mount.fuse.itable.lru_size=998508
>>>             # grep itable glusterdump.153904.dump.1517234161 | grep
>>> lru_size
>>>             xlator.mount.fuse.itable.lru_size=995990
>>>
>>>
>>>         Around 1 million of inodes in lru table!! These are the inodes
>>>         kernel has just cached and no operation is currently progress on
>>>         these inodes. This could be the reason for high memory usage.
>>>         We've a patch being worked on (merged on experimental branch
>>>         currently) [1], that will help in these sceanrios. In the
>>>         meantime can you remount glusterfs with options
>>>         --entry-timeout=0 and --attribute-timeout=0? This will make sure
>>>         that kernel won't cache inodes/attributes of the file and should
>>>         bring down the memory usage.
>>>
>>>         I am curious to know what is your data-set like? Is it the case
>>>         of too many directories and files present in deep directories? I
>>>         am wondering whether a significant number of inodes cached by
>>>         kernel are there to hold dentry structure in kernel.
>>>
>>>         [1] https://review.gluster.org/#/c/18665/
>>>         <https://review.gluster.org/#/c/18665/>
>>>
>>>
>>>     OK, remounted with your recommended attributes and repeated the
>>>     test. Now the mount process looks like this:
>>>
>>>     /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
>>>     --volfile-server=localhost --volfile-id=/GlusterWWW /var/www
>>>
>>>     However after running for 36 hours it's again at about 23g (about
>>>     the same place it was on the first test).
>>>
>>>     A few metrics from the 36 hour mark:
>>>
>>>     num_allocs for [cluster/distribute.GlusterWWW-dht - usage-type
>>>     gf_dht_mt_dht_layout_t memusage] is 109140094. Seems at least
>>>     somewhat similar to the original test, which had 117901593 at the 36
>>>     hour mark.
>>>
>>>     The dump file at the 36 hour mark had nothing for lru or lru_size.
>>>     However, at the dump two hours prior it had:
>>>
>>>     # grep itable glusterdump.67299.dump.1517493361 | grep lru | wc -l
>>>     998510
>>>     # grep itable glusterdump.67299.dump.1517493361 | grep lru_size
>>>     xlator.mount.fuse.itable.lru_size=998508
>>>
>>>     and the same thing for the dump four hours later. Are these values
>>>     only relevant when the ls -R is actually running? I'm thinking the
>>>     36 hour dump may have caught the ls -R between runs there (?)
>>>
>>>     The data set is multiple Web sites. I know there's some litter there
>>>     we can clean up, but I'd guess not more than 200-300k files or so.
>>>     The biggest culprit is a single directory that we use as a
>>>     multi-purpose file store, with filenames stored as GUIDs and linked
>>>     to a DB. That directory currently has 500k+ files. Another directory
>>>     serves a similar purpose and has about 66k files in it. The rest is
>>>     generally distributed more "normally", I.E., a mixed nesting of
>>>     directories and files.
>>>
>>>     Cheers!
>>>
>>>     Dan
>>>
>>>
>>>
>>>             # grep itable glusterdump.153904.dump.1517104561 | grep
>>>             purge | wc -l
>>>             1
>>>             # grep itable glusterdump.153904.dump.1517169361 | grep
>>>             purge | wc -l
>>>             1
>>>             # grep itable glusterdump.153904.dump.1517234161 | grep
>>>             purge | wc -l
>>>             1
>>>
>>>             # grep itable glusterdump.153904.dump.1517104561 | grep
>>>             purge_size
>>>             xlator.mount.fuse.itable.purge_size=0
>>>             # grep itable glusterdump.153904.dump.1517169361 | grep
>>>             purge_size
>>>             xlator.mount.fuse.itable.purge_size=0
>>>             # grep itable glusterdump.153904.dump.1517234161 | grep
>>>             purge_size
>>>             xlator.mount.fuse.itable.purge_size=0
>>>
>>>             Cheers,
>>>
>>>             Dan
>>>
>>>
>>>
>>>                     I've CC'd the fuse/ dht devs to see if these data
>>>                     types have potential
>>>                     leaks. Could you raise a bug with the volume info
>>>                     and a (dropbox?) link
>>>                     from which we can download the dumps? You can
>>>                     remove/replace the
>>>                     filepaths from them.
>>>
>>>                     Regards.
>>>                     Ravi
>>>
>>>
>>>                         Cheers!
>>>
>>>                         Dan
>>>
>>>
>>>                                 Is there potentially something
>>>                                 misconfigured here?
>>>
>>>                                 I did see a reference to a memory leak
>>>                                 in another thread in this
>>>                                 list, but that had to do with the
>>>                                 setting of quotas, I don't have
>>>                                 any quotas set on my system.
>>>
>>>                                 Thanks,
>>>
>>>                                 Dan Ragle
>>>                                 [email protected]
>>>
>>>                                 On 1/25/2018 11:04 AM, Dan Ragle wrote:
>>>
>>>                                     Having a memory issue with Gluster
>>>                                     3.12.4 and not sure how to
>>>                                     troubleshoot. I don't *think* this
>>>                                     is expected behavior. This is on an
>>>                                     updated CentOS 7 box. The setup is a
>>>                                     simple two node replicated layout
>>>                                     where the two nodes act as both
>>>                                     server and client. The volume in
>>>                                     question: Volume Name: GlusterWWW
>>>                                     Type: Replicate Volume ID:
>>>                                     8e9b0e79-f309-4d9b-a5bb-45d065faaaa3
>>>                                     Status: Started Snapshot Count: 0
>>>                                     Number of Bricks: 1 x 2 = 2
>>>                                     Transport-type: tcp Bricks: Brick1:
>>>                                     vs1dlan.mydomain.com:/glusterf
>>> s_bricks/brick1/www
>>>                                     Brick2:
>>>                                     vs2dlan.mydomain.com:/glusterf
>>> s_bricks/brick1/www
>>>                                     Options
>>>                                     Reconfigured:
>>>                                     nfs.disable: on
>>>                                     cluster.favorite-child-policy: mtime
>>>                                     transport.address-family: inet I had
>>>                                     some other performance options in
>>>                                     there, (increased cache-size, md
>>>                                     invalidation, etc) but stripped them
>>>                                     out in an attempt to isolate the
>>>                                     issue. Still got the problem without
>>>                                     them. The volume currently contains
>>>                                     over 1M files. When mounting the
>>>                                     volume, I get (among other things) a
>>>                                     process as such:
>>>                                     /usr/sbin/glusterfs
>>>                                     --volfile-server=localhost
>>>                                     --volfile-id=/GlusterWWW
>>>                                     /var/www This process begins with
>>>                                     little memory, but then as files are
>>>                                     accessed in the volume the memory
>>>                                     increases. I setup a script that
>>>                                     simply reads the files in the volume
>>>                                     one at a time (no writes). It's
>>>                                     been running on and off about 12
>>>                                     hours now and the resident memory of
>>>                                     the above process is already at 7.5G
>>>                                     and continues to grow slowly.
>>>                                     If I
>>>                                     stop the test script the memory
>>>                                     stops growing, but does not reduce.
>>>                                     Restart the test script and the
>>>                                     memory begins slowly growing again.
>>>                                     This
>>>                                     is obviously a contrived app
>>>                                     environment. With my intended
>>>                                     application
>>>                                     load it takes about a week or so for
>>>                                     the memory to get high enough to
>>>                                     invoke the oom killer. Is there
>>>                                     potentially something misconfigured
>>>                                     here? Thanks, Dan Ragle
>>>                                     [email protected]
>>>
>>>
>>>
>>>
>>>                                     ______________________________
>>> _________________
>>>                                     Gluster-users mailing list
>>>                                     [email protected]
>>>                                     <mailto:[email protected]>
>>>                                     http://lists.gluster.org/mailm
>>> an/listinfo/gluster-users
>>>                                     <http://lists.gluster.org/mail
>>> man/listinfo/gluster-users>
>>>
>>>                                 ______________________________
>>> _________________
>>>                                 Gluster-users mailing list
>>>                                 [email protected]
>>>                                 <mailto:[email protected]>
>>>                                 http://lists.gluster.org/mailm
>>> an/listinfo/gluster-users
>>>                                 <http://lists.gluster.org/mail
>>> man/listinfo/gluster-users>
>>>
>>>
>>>                         _______________________________________________
>>>                         Gluster-users mailing list
>>>                         [email protected]
>>>                         <mailto:[email protected]>
>>>                         http://lists.gluster.org/mailm
>>> an/listinfo/gluster-users
>>>                         <http://lists.gluster.org/mail
>>> man/listinfo/gluster-users>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Run away memory with gluster mount

Reply via email to