Not sure I would purge the cache before failing allocate but I would make sure that we don't exceed any of the limits that we have set for the different caches, the problem is that we don't caheck for limits on all memory that can just grow limitless. We must have limits for all caches, including inodes, exports, number of clients, number of locks, ... If we don't have limits on all memory structs and in-force them we are going to allow malicious clients bring down the server. Frank, you are doing the easy part of aborting any failed allocate but we should not make this change until we have solutions for all other issues that where discussed. Marc.
From: mala...@linux.vnet.ibm.com To: Marc Eshel/Almaden/IBM@IBMUS Cc: Frank Filz <ffilz...@mindspring.com>, nfs-ganesha-devel@lists.sourceforge.net Date: 11/02/2015 01:24 PM Subject: Re: [Nfs-ganesha-devel] Topic for discussion - Out of Memory Handling Marc Eshel [es...@us.ibm.com] wrote: > Yes, it looks like I am outvoted, memory management is complicated. Let me > first say that under no condition we should reboot the node any action > should be limited to the Ganesha process. When we fail to get heap memory > than yes kill the process, it would be nice at that point to get as much > information as possible to debug the problem, it can be a leak or memory > corruption, so we might need some memory in reserve to collect the > information. We should manage Ganesha cache in a way that will not cause > it to run out of memory so if we are getting memory to extend a cache we > should not abort before try to reduce the cache size. > Marc. If I understand, what Marc is recommending (probably the best) is that we try to allocate memory and if that fails, we empty our caches. After purging our caches, we try again to allocate. If second allocation fails, then we are SOL. If not, continue as though nothing has happened! We could also call memory defrags (malloc_trim) in addition to purging our caches... This is what Linux kernel does (actually many OSes), but I am not sure how easy this to implement. Maybe: First pass, abort on the first failure. Slowly implement cache purges and plug in the second allocation technique... To be honest, we had first hand experience of working on a memory allocation failure in the past. The Linux OOM happened before we ever got ENOMEM. So the second allocation idea may not be useful on Linux! Regards, Malahal.
------------------------------------------------------------------------------
_______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel