On Thu, 12 Mar 2009 15:29:02 +0100, Calum Mackay <Calum.Mackay at sun.com> wrote:
> Dai Ngo wrote: >> The problem was caused by an infinite loop in rfs4_cbinfo_hold(). >> This thread put a hold on the DBE of a rfs4_deleg_state causing the reaper >> thread to be delayed (forever). Since the deleg_state table was not >> cleaned, >> this caused the reaper threads of the file and client table to also be >> delayed, due to the hold on their DBEs from the deleg_state entries. > > Interesting; I wonder if this, or something similar, might be related to > the issue being seen in SC HA-NFSv4 where the distributed stable storage > files are occasionally not being removed from the SC RG state paths? > That mechanism is also dependent on the client DBE reap. > > However, if it's the case that the /var/nfs state file does get removed > when this occurs, that would be something else... fwiw, Calum is talking about: http://bugs.opensolaris.org/view_bug.do?bug_id=6802893 > cheers, > calum. > >> I added >> detailed >> analysis in the evaluation section of the CR. >> >> The fix is to limit the number of retries to 5 (5 secs). >> >> webrev: http://cr.opensolaris.org/~dain/6768607/ >> <http://cr.opensolaris.org/%7Edain/6768607/> >> CR: http://monaco.sfbay/detail.jsf?cr=6768607 >> >> Thanks, >> -Dai >> _______________________________________________ >> nfs-discuss mailing list >> nfs-discuss at opensolaris.org > -- frankB It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases this is a bad idea.
