Not sure I would purge the cache before failing allocate but I would make 
sure that we don't exceed any of the limits that we have set for the 
different caches, the problem is that we don't caheck for limits on all 
memory that can just grow limitless. We must have limits for all caches, 
including inodes, exports, number of clients, number of locks, ...
If we don't have limits on all memory structs and in-force them we are 
going to allow malicious clients bring down the server.
Frank, you are doing the easy part of aborting any failed allocate but we 
should not make this change until we have solutions for all other issues 
that where discussed. 
Marc.



From:   mala...@linux.vnet.ibm.com
To:     Marc Eshel/Almaden/IBM@IBMUS
Cc:     Frank Filz <ffilz...@mindspring.com>, 
nfs-ganesha-devel@lists.sourceforge.net
Date:   11/02/2015 01:24 PM
Subject:        Re: [Nfs-ganesha-devel] Topic for discussion - Out of 
Memory  Handling



Marc Eshel [es...@us.ibm.com] wrote:
>    Yes, it looks like I am outvoted, memory management is complicated. 
Let me
>    first say that under no condition we should reboot the node any 
action
>    should be limited to the Ganesha process. When we fail to get heap 
memory
>    than yes kill the process, it would be nice at that point to get as 
much
>    information as possible to debug the problem, it can be a leak or 
memory
>    corruption, so we might need some memory in reserve to collect the
>    information. We should manage Ganesha cache in a way that will not 
cause
>    it to run out of memory so if we are getting memory to extend a cache 
we
>    should not abort before try to reduce the cache size.
>    Marc.

If I understand, what Marc is recommending (probably the best) is that
we try to allocate memory and if that fails, we empty our caches. After
purging our caches, we try again to allocate. If second allocation
fails, then we are SOL. If not, continue as though nothing has happened!

We could also call memory defrags (malloc_trim) in addition to purging
our caches...

This is what Linux kernel does (actually many OSes), but I am not
sure how easy this to implement.

Maybe: First pass, abort on the first failure. Slowly implement cache
purges and plug in the second allocation technique...

To be honest, we had first hand experience of working on a memory
allocation failure in the past. The Linux OOM happened before we ever
got ENOMEM. So the second allocation idea may not be useful on Linux!

Regards, Malahal.


------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to