On Mi, 2015-10-28 at 21:33 -0700, Frank Filz wrote: > So my question is what allocations do we attempt to recover from? And > what does that recovery look like? And how do we make sure Ganesha is > actually running in a sane way if we do recover? And are we just > kicking the can to the next allocation which chooses not to recover? > It seems like if we are going to try and keep running, we should do so > in almost all cases, using abort only for those cases that are just > way too complex to recover from (for example, there is an out of > memory condition in unlock if lock owners aren’t supported where it > can become impossible to get to a correct set of locks). > > > > Frank > That sounds like a good strategy to me. +1
Swen > > > From: Marc Eshel [mailto:es...@us.ibm.com] > Sent: Wednesday, October 28, 2015 7:38 PM > To: Frank Filz <ffilz...@mindspring.com> > Cc: nfs-ganesha-devel@lists.sourceforge.net > Subject: Re: [Nfs-ganesha-devel] Topic for discussion - Out of Memory > Handling > > > > > I don't believe that we need to restart Ganesha on every out of memory > calls for many reasons, but I will agree that we can have two types or > calls one that can accept no memory rc and one that terminate Ganesha > if the call is not successful. > Marc. > > > > From: "Frank Filz" <ffilz...@mindspring.com> > To: <nfs-ganesha-devel@lists.sourceforge.net> > Date: 10/28/2015 11:55 AM > Subject: [Nfs-ganesha-devel] Topic for discussion - Out of > Memory Handling > > > ______________________________________________________________________ > > > > We have had various discussions over the years as to how to best > handle out > of memory conditions. > > In the meantime, our code is littered with attempts to handle the > situation, > however, it is not clear to me these really solve anything. If we > don't have > 100% recoverability, likely we just delay the crash. Even if we manage > to > avoid crashing, we may wobble along not really handling things well, > causing > retry storms and such (that just dig us in deeper). Another > possibility is > we return an error to the client that gets translated into EIO or some > other > error the application isn't prepared to handle. > > If instead, we just aborted, the HA systems most of us run under would > restart Ganesha. The clients would see some delay, but there should be > no > visible errors to the clients. Depending on how well grace > period/state > recovery is implemented (and in particular how well it's integrated > with > other file servers such as CIFS/SMB or across a cluster), there could > be > some openings for lock violation (someone is able to steal a lock from > one > of our clients while Ganesha is down). > > Aborting would have several advantages. First, it would immediately > clear up > any memory leaks. Second, if there was some transient activity that > resulted > in high memory utilization, that might also be cleared up. Third, it > would > avoid retry storms and such that might just aggravate the low memory > condition. In addition, it would force the sysadmin to deal with a > workload > that overloaded the server, possibly by adding additional nodes in a > clustered environment, or adding memory to the server. > > No matter what we decide to do, another thing we need to look at is > more > memory throttling. Cache inode has a limit on the number of inodes. > This is > helpful, but is incomplete. Other candidates for memory throttling > would be: > > Number of clients > Number of state (opens, locks, delegations, layouts) (per client > and/or > global) > Size of ACLs and number of ACLs cached > > I'm sure there's more, discuss. > > Frank > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > > ------------------------------------------------------------------------------ > _______________________________________________ > Nfs-ganesha-devel mailing list > Nfs-ganesha-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel > > > > > > > > > ______________________________________________________________________ > Avast logo > This email has been checked for > viruses by Avast antivirus > software. > www.avast.com > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Nfs-ganesha-devel mailing list > Nfs-ganesha-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel