On Thu, 12 Mar 2009 15:29:02 +0100, Calum Mackay <Calum.Mackay at sun.com> 
wrote:

> Dai Ngo wrote:
>> The problem was caused by an infinite loop in rfs4_cbinfo_hold().
>> This thread put a hold on the DBE of a rfs4_deleg_state causing the reaper
>> thread to be delayed (forever). Since the deleg_state table was not
>> cleaned,
>> this caused the reaper threads of the file and client table to also be
>> delayed, due to the hold on their DBEs from the deleg_state entries.
>
> Interesting; I wonder if this, or something similar, might be related to
> the issue being seen in SC HA-NFSv4 where the distributed stable storage
> files are occasionally not being removed from the SC RG state paths?
> That mechanism is also dependent on the client DBE reap.
>
> However, if it's the case that the /var/nfs state file does get removed
> when this occurs, that would be something else...

fwiw, Calum is talking about:

http://bugs.opensolaris.org/view_bug.do?bug_id=6802893


> cheers,
> calum.
>
>> I added
>> detailed
>> analysis in the evaluation section of the CR.
>>
>> The fix is to limit the number of retries to 5 (5 secs).
>>
>> webrev: http://cr.opensolaris.org/~dain/6768607/
>> <http://cr.opensolaris.org/%7Edain/6768607/>
>> CR: http://monaco.sfbay/detail.jsf?cr=6768607
>>
>> Thanks,
>> -Dai
>> _______________________________________________
>> nfs-discuss mailing list
>> nfs-discuss at opensolaris.org
>



-- 
frankB

It is always possible to agglutinate multiple separate problems
into a single complex interdependent solution.
In most cases this is a bad idea.

Reply via email to