On Mon Nov 15 23:23:12 EST 2010, [email protected] wrote:
> Regarding the "deadlock" report that I occasionally see on my CPU
> server console, I won't bore anyone with PC addresses or anything like
> that, but I will recommend something I believe to be a possible
> trigger: the failure always seems to occur within "exportfs", which in
> this case is used exclusively to run stats(1) remotely from my
> workstation. So the recommendation is that somebody like Erik, who is
> infinitely more clued up than I am in the kernel arcana should run one
> or more stats sessions into a cpu server (I happen to be running
> fossil, so maybe Erik won't see this) and see if he can also trigger this
> behaviour. I'm hoping that it is not platform specific.
>
> Right now, I'm short of skills as well as a serial console :-(
i run stats all the time. i've never seen a lock loop caused by stats.
exportfs gets blamed all the time for the sins of others. possible
culprits are the tcp/ip stack and the kernel devices that stats accesses
and of course, the channel code itself.
it would be a good idea for you to track down all the pcs involved
and send them along. i can't think of another way of narrowing down
the list of potential suspects. not all of our usual suspects has an
alibi.
i assume you've fixed this? (not yet fixed on sources.)
/n/sources/plan9//sys/src/9/port/chan.c:1012,1018 - chan.c:1012,1020
/*
* mh->mount->to == c, so start at
mh->mount->next
*/
+ f = nil;
rlock(&mh->lock);
+ if(mh->mount)
for(f = mh->mount->next; f; f = f->next)
if((wq = ewalk(f->to, nil, names+nhave,
ntry)) != nil)
break;
- erik