From: Kaleb S. KEITHLEY [mailto:kkeit...@redhat.com] > On 10/28/2015 02:55 PM, Frank Filz wrote: > > We have had various discussions over the years as to how to best > > handle out of memory conditions. > > a) I see that, e.g., jemalloc's mallctl(3) has the ability to selectively > purge > unused dirty pages from one or all arenas. > > I don't know what that actually means in this case. Maybe it means that a > malloc() call might succeed on a retry after purging?
I'm not sure what that might buy us. It isn't going to help release Ganesha leaked memory, or reduce Ganesha's memory foot print. It might make Ganesha play nicer with other processes (i.e., if Ganesha had a spike in memory usage, maybe it shrinks Ganesha's memory foot print, but I don't think that would make Ganesha's malloc any more likely to succeed (i.e. if Ganesha has some unused heap pages, Ganesha's malloc would find space in them). I guess one place might be is to break up page sized fragmentation (i.e. Ganesha has lots of <= 2 page holes in it's heap, and then needs a >2 page allocation, if Ganesha could release a bunch of single pages, allowing the heap to grow on the top end and then be able to allocate 3 pages to satisfy that big allocation - but I dunno enough about memory allocators to know if that's remotely how they work...). > Then my only concern is that while Fedora and EPEL builds of nfs-ganesha are > made _with_ jemalloc, our downstream RHGS product is _not_ built with > jemalloc. > > Maintaining the source to work with either might be an unnecessary bit of > complexity that we don't need or want. > > b) Some allocators will munmap() (or negative sbrk()) to return memory > when possible. It looks like jemalloc will do that if it's configured to do so > when it's built. Need to confirm how jemalloc is built. > > c) On the whole though, I believe our RHGS users should be deploying on > machines with lots of RAM. If ganesha.nfsd gets ENOMEM on malloc() on > one of these then the machine as a whole is probably in an unhealthy state. > On Linux, if the whole box is under memory pressure, even if ganesha.nfsd is > still calling malloc(3) successfully, the OOM killer might kill the > ganesha.nfsd > anyway if the heuristics the OOM killer use decide that it, as possibly the > largest consumer of memory, is the best candidate to kill. I agree, if Ganesha's memory footprint is huge, OOM killer is going to come hunting... And I also agree, if the overall system memory pressure is high, the system as a whole is likely unstable. Ganesha being one of the bigger memory hogs restarting might be able to help that (and certainly if Ganesha restart triggers HA to restart the whole node, that would solve the problem). Again, of course, restart or any other solution other than a serious memory diet (which we should try to do anywhere possible anyway) will not help if the reality is the workload is too big for the system (and then a memory diet probably just means the users manage to make their workload even bigger :-). > d) And are we sure that there aren't any memory leaks still to be uncovered? > I'd almost rather spend time on that than devoting a lot of energy in trying > to > keep running in a marginal situation. Yes, we should continue to hunt down memory leaks. We should also make sure our allocator is working well to prevent memory fragmentation (which is basically also a memory leak, but harder to control from source code). Frank --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel