Re: [ceph-users] RAM usage only very slowly decreases after cluster recovery

Somnath Roy Thu, 27 Aug 2015 21:52:02 -0700

Slow memory release could also be because of tcmalloc. Tcmalloc doesn't release 
the memory the moment application issue a 'delete' but it cached it inside for 
future use.
If it is not a production cluster and you have spare time to reproduce this, I 
would suggest to build Ceph code with jemalloc and see the behavior. It should 
be releasing memory much faster than tcmalloc. Basically behavior of jemalloc 
is similar to glibcmalloc.


Thanks & Regards
Somnath


-----Original Message-----
From: ceph-users [mailto:[email protected]] On Behalf Of Haomai 
Wang
Sent: Thursday, August 27, 2015 7:31 PM
To: Chad William Seys
Cc: [email protected]
Subject: Re: [ceph-users] RAM usage only very slowly decreases after cluster 
recovery

Yes, we already notice this, and have PR to fix partial of this I think 
https://github.com/ceph/ceph/pull/5451/files

On Fri, Aug 28, 2015 at 4:59 AM, Chad William Seys <[email protected]> 
wrote:
> Hi all,
>
> It appears that OSD daemons only very slowly free RAM after an
> extended period of an unhealthy cluster (shuffling PGs around).
>
> Prior to a power outage (and recovery) around July 25th, the amount of
> RAM used was fairly constant, at most 10GB (out of 24GB).  You can see
> in the attached PNG "osd6_stack2.png" (Week 30) that the amount of
> used RAM on osd06.physics.wisc.edu was holding steady around 7GB.
>
> Around July 25th our Ceph cluster rebooted after a power outage.  Not
> all nodes booted successfully, so Ceph proceeded to shuffle PGs to
> attempt to return to health with the renaming nodes.  You can see in
> "osd6_stack2.png" two purplish spikes showing that the node used
> around 10GB swap space during the recovery period.
>
> Finally the cluster recovered around July 31st.  During that period
> some I had to take some osd daemons out of the pool b/c their nodes
> ran out of swap space and the daemons were killed by the out of memory
> (OOM) kernel feature.  (The recovery period was probably extended by
> me trying to add the daemons/drives back. If I recall correctly that
> is what was occurring during the second swap
> peak.)
>
> This RAM usage pattern is in generalthe same for all the nodes in the cluster.
>
> Almost three weeks later, the amount of RAM used on the node is still
> decreasing, but it has not returned to pre-power outage levels. 15GB
> instead of 7GB.
>
> Why is Ceph using 2x more RAM than it used to in steady state?
>
> Thanks,
> Chad.
>
> (P.S.  It is really unfortunate that Ceph uses more RAM when
> recovering - can lead to cascading failure!)
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RAM usage only very slowly decreases after cluster recovery

Reply via email to