To help with the discussion, here is a patch that creates a file that describes most of our memory allocation (at least that I could find easily, ignoring library functions outside of the malloc family):
https://review.gerrithub.io/250912 Please make comments on gerrithub if you have additional thoughts about some of them. Thanks Frank > -----Original Message----- > From: Frank Filz [mailto:ffilz...@mindspring.com] > Sent: Wednesday, October 28, 2015 11:55 AM > To: nfs-ganesha-devel@lists.sourceforge.net > Subject: [Nfs-ganesha-devel] Topic for discussion - Out of Memory Handling > > We have had various discussions over the years as to how to best handle out > of memory conditions. > > In the meantime, our code is littered with attempts to handle the situation, > however, it is not clear to me these really solve anything. If we don't have > 100% recoverability, likely we just delay the crash. Even if we manage to > avoid crashing, we may wobble along not really handling things well, causing > retry storms and such (that just dig us in deeper). Another possibility is we > return an error to the client that gets translated into EIO or some other error > the application isn't prepared to handle. > > If instead, we just aborted, the HA systems most of us run under would > restart Ganesha. The clients would see some delay, but there should be no > visible errors to the clients. Depending on how well grace period/state > recovery is implemented (and in particular how well it's integrated with other > file servers such as CIFS/SMB or across a cluster), there could be some > openings for lock violation (someone is able to steal a lock from one of our > clients while Ganesha is down). > > Aborting would have several advantages. First, it would immediately clear up > any memory leaks. Second, if there was some transient activity that resulted > in high memory utilization, that might also be cleared up. Third, it would avoid > retry storms and such that might just aggravate the low memory condition. In > addition, it would force the sysadmin to deal with a workload that > overloaded the server, possibly by adding additional nodes in a clustered > environment, or adding memory to the server. > > No matter what we decide to do, another thing we need to look at is more > memory throttling. Cache inode has a limit on the number of inodes. This is > helpful, but is incomplete. Other candidates for memory throttling would be: > > Number of clients > Number of state (opens, locks, delegations, layouts) (per client and/or > global) > Size of ACLs and number of ACLs cached > > I'm sure there's more, discuss. > > Frank > > > --- > This email has been checked for viruses by Avast antivirus software. > https://www.avast.com/antivirus > > > ---------------------------------------------------------------------------- -- > _______________________________________________ > Nfs-ganesha-devel mailing list > Nfs-ganesha-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus ------------------------------------------------------------------------------ _______________________________________________ Nfs-ganesha-devel mailing list Nfs-ganesha-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel