To help with the discussion, here is a patch that creates a file that
describes most of our memory allocation (at least that I could find easily,
ignoring library functions outside of the malloc family):

https://review.gerrithub.io/250912

Please make comments on gerrithub if you have additional thoughts about some
of them.

Thanks

Frank

> -----Original Message-----
> From: Frank Filz [mailto:ffilz...@mindspring.com]
> Sent: Wednesday, October 28, 2015 11:55 AM
> To: nfs-ganesha-devel@lists.sourceforge.net
> Subject: [Nfs-ganesha-devel] Topic for discussion - Out of Memory Handling
> 
> We have had various discussions over the years as to how to best handle
out
> of memory conditions.
> 
> In the meantime, our code is littered with attempts to handle the
situation,
> however, it is not clear to me these really solve anything. If we don't
have
> 100% recoverability, likely we just delay the crash. Even if we manage to
> avoid crashing, we may wobble along not really handling things well,
causing
> retry storms and such (that just dig us in deeper). Another possibility is
we
> return an error to the client that gets translated into EIO or some other
error
> the application isn't prepared to handle.
> 
> If instead, we just aborted, the HA systems most of us run under would
> restart Ganesha. The clients would see some delay, but there should be no
> visible errors to the clients. Depending on how well grace period/state
> recovery is implemented (and in particular how well it's integrated with
other
> file servers such as CIFS/SMB or across a cluster), there could be some
> openings for lock violation (someone is able to steal a lock from one of
our
> clients while Ganesha is down).
> 
> Aborting would have several advantages. First, it would immediately clear
up
> any memory leaks. Second, if there was some transient activity that
resulted
> in high memory utilization, that might also be cleared up. Third, it would
avoid
> retry storms and such that might just aggravate the low memory condition.
In
> addition, it would force the sysadmin to deal with a workload that
> overloaded the server, possibly by adding additional nodes in a clustered
> environment, or adding memory to the server.
> 
> No matter what we decide to do, another thing we need to look at is more
> memory throttling. Cache inode has a limit on the number of inodes. This
is
> helpful, but is incomplete. Other candidates for memory throttling would
be:
> 
> Number of clients
> Number of state (opens, locks, delegations, layouts) (per client and/or
> global)
> Size of ACLs and number of ACLs cached
> 
> I'm sure there's more, discuss.
> 
> Frank
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 
> 
>
----------------------------------------------------------------------------
--
> _______________________________________________
> Nfs-ganesha-devel mailing list
> Nfs-ganesha-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


------------------------------------------------------------------------------
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Reply via email to