Marc Fournier wrote:
> On 2013-02-13, at 3:54 PM, Rick Macklem <rmack...@uoguelph.ca> wrote:
> 
> >>
> > The pid that is in "T" state for the "ps auxlH".
> 
> Different server, last kernel update on Jan 22nd, https process this
> time instead of du last time.
> 
> I've attached:
> 
> ps auxlH
> ps auxlH of just the processes that are in TJ state (6 httpd servers)
> procstat output for each of the 6 process
> 
> 
> 
> 
> They are included as attachments … if these don't make it through, let
> me know, just figured I'd try and keep it compact ...
Ok, I took a look and the interesting process seems to be 16693. It is
stopped ("T" state) and several of its threads (22, but not all) have
a procstat like this:
16693 104135 httpd            -                mi_switch+0x186 
thread_suspend_check+0x19f sleepq_catch_signals+0x1c5
   sleepq_timedwait_sig+0x19 _sleep+0x2ca clnt_vc_call+0x763 
clnt_reconnect_call+0xfb
   newnfs_request+0xadb nfscl_request+0x72 nfsrpc_accessrpc+0x1df 
nfs34_access_otw+0x56 nfs_access+0x306
   vn_open_cred+0x5a8 kern_openat+0x20a amd64_syscall+0x540 Xfast_syscall+0xf7 

The sleep in clnt_vc_call is waiting for an RPC reply (while a vnode
lock is held) with PCATCH | PBDRY flags, since it interruptible.

I can see that the thread_suspend_check() has a 1 argument (return_instead == 
1),
since there is only one call to thread_suspend_check() in 
sleepq_catch_signals().

When looking at thread_suspend_check(), I basically got lost, although it
seems that it can only "return_instead" if there is a single thread and
not multiple threads doing this.

If these threads are stuck here and won't return from msleep(), that would
explain the hang.

If they would wakeup and return from the msleep() when a wakeup occurs, it
would suggest that there is a lost reply or similar, so the wakeup isn't
occurring.

I also don't know if a timeout of the msleep() will still occur and make
the msleep() return?

Although it wasn't done to fix this, it looks like jhb@'s recent patch to
head (r246417) might fix this, since it reworks how STOP signals are handled
for interruptible mounts.

Hopefully kib or jhb can provide more insight.

Btw Marc, if you just want this problem to go away, I suspect getting rid
of the "intr" mount option would do that.

rick

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to