Re: [Nfs-ganesha-devel] Topic for discussion - Out of Memory Handling

Frank Filz Thu, 29 Oct 2015 07:02:42 -0700

From: Kaleb S. KEITHLEY [mailto:kkeit...@redhat.com]
> On 10/28/2015 02:55 PM, Frank Filz wrote:
> > We have had various discussions over the years as to how to best
> > handle out of memory conditions.
> 
> a) I see that, e.g., jemalloc's mallctl(3) has the ability to selectively 
> purge
> unused dirty pages from one or all arenas.
> 
> I don't know what that actually means in this case. Maybe it means that a
> malloc() call might succeed on a retry after purging?


I'm not sure what that might buy us. It isn't going to help release Ganesha 
leaked memory, or reduce Ganesha's memory foot print.

It might make Ganesha play nicer with other processes (i.e., if Ganesha had a 
spike in memory usage, maybe it shrinks Ganesha's memory foot print, but I 
don't think that would make Ganesha's malloc any more likely to succeed (i.e. 
if Ganesha has some unused heap pages, Ganesha's malloc would find space in 
them). I guess one place might be is to break up page sized fragmentation (i.e. 
Ganesha has lots of <= 2 page holes in it's heap, and then needs a >2 page 
allocation, if Ganesha could release a bunch of single pages, allowing the heap 
to grow on the top end and then be able to allocate 3 pages to satisfy that big 
allocation - but I dunno enough about memory allocators to know if that's 
remotely how they work...).

> Then my only concern is that while Fedora and EPEL builds of nfs-ganesha are
> made _with_ jemalloc, our downstream RHGS product is _not_ built with
> jemalloc.
> 
> Maintaining the source to work with either might be an unnecessary bit of
> complexity that we don't need or want.
> 
> b) Some allocators will munmap() (or negative sbrk()) to return memory
> when possible. It looks like jemalloc will do that if it's configured to do so
> when it's built. Need to confirm how jemalloc is built.
> 
> c) On the whole though, I believe our RHGS users should be deploying on
> machines with lots of RAM. If ganesha.nfsd gets ENOMEM on malloc() on
> one of these then the machine as a whole is probably in an unhealthy state.
> On Linux, if the whole box is under memory pressure, even if ganesha.nfsd is
> still calling malloc(3) successfully, the OOM killer might kill the 
> ganesha.nfsd
> anyway if the heuristics the OOM killer use decide that it, as possibly the
> largest consumer of memory, is the best candidate to kill.

I agree, if Ganesha's memory footprint is huge, OOM killer is going to come 
hunting...

And I also agree, if the overall system memory pressure is high, the system as 
a whole is likely unstable. Ganesha being one of the bigger memory hogs 
restarting might be able to help that (and certainly if Ganesha restart 
triggers HA to restart the whole node, that would solve the problem). Again, of 
course, restart or any other solution other than a serious memory diet (which 
we should try to do anywhere possible anyway) will not help if the reality is 
the workload is too big for the system (and then a memory diet probably just 
means the users manage to make their workload even bigger :-).

> d) And are we sure that there aren't any memory leaks still to be uncovered?
> I'd almost rather spend time on that than devoting a lot of energy in trying 
> to
> keep running in a marginal situation.

Yes, we should continue to hunt down memory leaks.

We should also make sure our allocator is working well to prevent memory 
fragmentation (which is basically also a memory leak, but harder to control 
from source code).

Frank


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Re: [Nfs-ganesha-devel] Topic for discussion - Out of Memory Handling

Reply via email to