On Jul 27, 2013, at 11:25 PM, Konstantin Belousov <[email protected]> wrote:

> On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote:
>> Let's assume the pid which started the deadlock is 14001 (it will be a 
>> different pid when we get the results, because the machine has been 
>> restarted)
>> 
>> I type:
>> 
>> show proc 14001
>> 
>> I get the thread numbers from that output and type:
>> 
>> show thread xxxxx
>> 
>> for each one.
>> 
>> And a trace for each thread with the command?
>> 
>> tr xxxx
>> 
>> Anything else I should try to get or do? Or is that not the data at all you 
>> are looking for?
>> 
> Yes, everything else which is listed in the 'debugging deadlocks' page
> must be provided, otherwise the deadlock cannot be tracked.
> 
> The investigator should be able to see the whole deadlock chain (loop)
> to make any useful advance.

Ok, I have made some excellent progress in debugging the NFS deadlock.

Rick! You are genius. :-) You found the right commit r250907 (dated May 22) is 
the definitely the problem.

Here is how I did the testing: One machine received a kernel before r250907, 
the second machine received a kernel after r250907. Sure enough within a few 
hours the machine with r250907 went into the usual deadlock state. The machine 
without that commit kept on working fine. Then I went back to the latest 
revision (r253726), but leaving r250907 out. The machines have been running 
happy and rock solid without any deadlocks. I have expanded the testing to 3 
machines now and no reports of any issues.

I guess now Konstantin has to figure out why that commit is causing the 
deadlock. Lovely! :-) I will get that information as soon as possible. I'm a 
little behind with normal work load, but I expect to have the data by Tuesday 
evening or Wednesday.

Thanks again!!

Michael

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to