On Jul 27, 2013, at 11:25 PM, Konstantin Belousov <[email protected]> wrote:
> On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote: >> Let's assume the pid which started the deadlock is 14001 (it will be a >> different pid when we get the results, because the machine has been >> restarted) >> >> I type: >> >> show proc 14001 >> >> I get the thread numbers from that output and type: >> >> show thread xxxxx >> >> for each one. >> >> And a trace for each thread with the command? >> >> tr xxxx >> >> Anything else I should try to get or do? Or is that not the data at all you >> are looking for? >> > Yes, everything else which is listed in the 'debugging deadlocks' page > must be provided, otherwise the deadlock cannot be tracked. > > The investigator should be able to see the whole deadlock chain (loop) > to make any useful advance. Ok, I have made some excellent progress in debugging the NFS deadlock. Rick! You are genius. :-) You found the right commit r250907 (dated May 22) is the definitely the problem. Here is how I did the testing: One machine received a kernel before r250907, the second machine received a kernel after r250907. Sure enough within a few hours the machine with r250907 went into the usual deadlock state. The machine without that commit kept on working fine. Then I went back to the latest revision (r253726), but leaving r250907 out. The machines have been running happy and rock solid without any deadlocks. I have expanded the testing to 3 machines now and no reports of any issues. I guess now Konstantin has to figure out why that commit is causing the deadlock. Lovely! :-) I will get that information as soon as possible. I'm a little behind with normal work load, but I expect to have the data by Tuesday evening or Wednesday. Thanks again!! Michael _______________________________________________ [email protected] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[email protected]"
