Re: [Nfs-ganesha-devel] Topic for discussion - Out of Memory Handling

Swen Schillig Thu, 29 Oct 2015 00:45:41 -0700

On Mi, 2015-10-28 at 21:33 -0700, Frank Filz wrote:
> So my question is what allocations do we attempt to recover from? And
> what does that recovery look like? And how do we make sure Ganesha is
> actually running in a sane way if we do recover? And are we just
> kicking the can to the next allocation which chooses not to recover?
> It seems like if we are going to try and keep running, we should do so
> in almost all cases, using abort only for those cases that are just
> way too complex to recover from (for example, there is an out of
> memory condition in unlock if lock owners aren’t supported where it
> can become impossible to get to a correct set of locks).
> 
>  
> 
> Frank
> 
That sounds like a good strategy to me.
+1


Swen
>  
> 
> From: Marc Eshel [mailto:es...@us.ibm.com] 
> Sent: Wednesday, October 28, 2015 7:38 PM
> To: Frank Filz <ffilz...@mindspring.com>
> Cc: nfs-ganesha-devel@lists.sourceforge.net
> Subject: Re: [Nfs-ganesha-devel] Topic for discussion - Out of Memory
> Handling
> 
> 
>  
> 
> I don't believe that we need to restart Ganesha on every out of memory
> calls for many reasons, but I will agree that we can have two types or
> calls one that can accept no memory rc and one that terminate Ganesha
> if the call is not successful.   
> Marc. 
> 
> 
> 
> From:        "Frank Filz" <ffilz...@mindspring.com> 
> To:        <nfs-ganesha-devel@lists.sourceforge.net> 
> Date:        10/28/2015 11:55 AM 
> Subject:        [Nfs-ganesha-devel] Topic for discussion - Out of
> Memory Handling 
> 
>                                    
> ______________________________________________________________________
> 
> 
> 
> We have had various discussions over the years as to how to best
> handle out
> of memory conditions.
> 
> In the meantime, our code is littered with attempts to handle the
> situation,
> however, it is not clear to me these really solve anything. If we
> don't have
> 100% recoverability, likely we just delay the crash. Even if we manage
> to
> avoid crashing, we may wobble along not really handling things well,
> causing
> retry storms and such (that just dig us in deeper). Another
> possibility is
> we return an error to the client that gets translated into EIO or some
> other
> error the application isn't prepared to handle.
> 
> If instead, we just aborted, the HA systems most of us run under would
> restart Ganesha. The clients would see some delay, but there should be
> no
> visible errors to the clients. Depending on how well grace
> period/state
> recovery is implemented (and in particular how well it's integrated
> with
> other file servers such as CIFS/SMB or across a cluster), there could
> be
> some openings for lock violation (someone is able to steal a lock from
> one
> of our clients while Ganesha is down).
> 
> Aborting would have several advantages. First, it would immediately
> clear up
> any memory leaks. Second, if there was some transient activity that
> resulted
> in high memory utilization, that might also be cleared up. Third, it
> would
> avoid retry storms and such that might just aggravate the low memory
> condition. In addition, it would force the sysadmin to deal with a
> workload
> that overloaded the server, possibly by adding additional nodes in a
> clustered environment, or adding memory to the server.
> 
> No matter what we decide to do, another thing we need to look at is
> more
> memory throttling. Cache inode has a limit on the number of inodes.
> This is
> helpful, but is incomplete. Other candidates for memory throttling
> would be:
> 
> Number of clients
> Number of state (opens, locks, delegations, layouts) (per client
> and/or
> global)
> Size of ACLs and number of ACLs cached
> 
> I'm sure there's more, discuss.
> 
> Frank
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel
> 
> 
> 
> 
> 
> 
> 
> 
> ______________________________________________________________________
> Avast logo
> This email has been checked for
> viruses by Avast antivirus
> software. 
> www.avast.com 
> 
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel



------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Re: [Nfs-ganesha-devel] Topic for discussion - Out of Memory Handling

Reply via email to