Dai Ngo wrote: > The problem was caused by an infinite loop in rfs4_cbinfo_hold(). > This thread put a hold on the DBE of a rfs4_deleg_state causing the reaper > thread to be delayed (forever). Since the deleg_state table was not > cleaned, > this caused the reaper threads of the file and client table to also be > delayed, > due to the hold on their DBEs from the deleg_state entries.
Interesting; I wonder if this, or something similar, might be related to the issue being seen in SC HA-NFSv4 where the distributed stable storage files are occasionally not being removed from the SC RG state paths? That mechanism is also dependent on the client DBE reap. However, if it's the case that the /var/nfs state file does get removed when this occurs, that would be something else... cheers, calum. > I added > detailed > analysis in the evaluation section of the CR. > > The fix is to limit the number of retries to 5 (5 secs). > > webrev: http://cr.opensolaris.org/~dain/6768607/ > <http://cr.opensolaris.org/%7Edain/6768607/> > CR: http://monaco.sfbay/detail.jsf?cr=6768607 > > Thanks, > -Dai > _______________________________________________ > nfs-discuss mailing list > nfs-discuss at opensolaris.org -- Calum Mackay Senior Staff Engineer Systems Group, Quality Office