[mdb-discuss] Questions about overdue callout_execute

Oliver Yang Mon, 27 Aug 2007 16:48:36 +0800

Hi All,

I have several questions about overdue callout_execute


1. How does the system handle a overdue callout_execute?

2. As far as I know, setrun entry in callout table will be handled as a 
softcall, which is a software interrupt(softlevel1) with PIL 1, right?
 
3. If we found there are so many overdue setrun entries in callout 
table, what we should check by using mdb?

Here are background information with my other questions:

I had encountered a system hang issue in my test env and I forced a 
crash dump file successfully.

In this crash dump file, I found a e1000g driver thread blocked on 
cv_timedwait, but we supposed it should return after 1 tick, but it had 
never returned.

 > ::callout ! grep fffffe80044e5c80
setrun                   fffffe80044e5c80 3ffffffffffe1a80            
1036d (T-33730)  ---> it was overdue about 33730 ticks.

stack pointer for thread fffffe80044e5c80: fffffe80044e5880
[ fffffe80044e5880 _resume_from_idle+0xf8() ]
 fffffe80044e58c0 swtch+0x167()
 fffffe80044e5930 cv_timedwait+0xcf(ffffffff82f76390, ffffffff82f76388, 
1036d)
 fffffe80044e59c0 cv_timedwait_sig+0x2cc(ffffffff82f76390, 
ffffffff82f76388, 1036d)
 fffffe80044e5a70 e1000g_send+0x136(ffffffff82f76370, ffffffffac2fce40)
 fffffe80044e5ab0 e1000g_m_tx+0x6f(ffffffff82f76000, ffffffffa21f8180)
 fffffe80044e5ad0 dls_tx+0x1d(ffffffff82f2ec80, ffffffffa21f8180)
 fffffe80044e5b20 dld_wsrv+0xcc(ffffffff894acb70)
 fffffe80044e5b50 runservice+0x42(ffffffff894acb70)
 fffffe80044e5b80 queue_service+0x42(ffffffff894acb70)
 fffffe80044e5bc0 stream_service+0x73(ffffffff83905740)
 fffffe80044e5c60 taskq_d_thread+0xbb(ffffffff833af820)
 fffffe80044e5c70 thread_start+8()

After I check the callout table, I found the relevant callout entry for 
this thread, and it was overdue for some reason:

 > ::callout ! grep fffffe80044e5c80
setrun                   fffffe80044e5c80 3ffffffffffe1a80            
1036d (T-33730)       --------> This indicate it is overdue.

I also find about 2573 overdue entries in callout table, it was really a 
big number:

 > ::callout
FUNCTION                 ARGUMENT         ID               TIME           
sigalarm2proc            ffffffff9569aae0 7fffffffffffc010            
144a1 (T-17038) -----> This indicated the it was overdue.
sigalarm2proc            ffffffff91bb7510 7fffffffffffe010            
14484 (T-17067)
sigalarm2proc            ffffffff9569c380 7fffffffffffc020            
144a1 (T-17038)
sigalarm2proc            ffffffff95428d48 7fffffffffffc030            
144a1 (T-17038)
sigalarm2proc            ffffffff91bb8db0 7fffffffffffe030            
14483 (T-17068)
sigalarm2proc            ffffffff9542b238 7fffffffffffc040            
144a1 (T-17038)
.....................[snipped]....................................................................

 > ::callout ! grep "T-" | wc -l
    2573

Why we could find so many overdue entries in callout table?

Since the setrun entry will be processed by softcall, I tried to print 
the global softcall list by softhead.
And we did find there are two callout_execute was pending to handled by 
softint:

 > *softhead::list softcall_t sc_next|::print softcall_t
{
    sc_func = callout_execute
    sc_arg = 0xffffffff80219000
    sc_next = softcalls+0x1290
}
{
    sc_func = callout_execute
    sc_arg = 0xffffffff80216000
    sc_next = 0
}

 > softcall_state/J
softcall_state:
softcall_state: 2              

#define SOFT_PEND               0x02    /* softcall list needs processing */

But when I checked the softint by following command, I found no pending 
count in softlevel1 line, why?

I think the softlevel1 should call into softint(), then, the 
callout_execute would be executed.

But it seemed it had never happened, who can give me some ideas about it?


 > ::softint
ADDR             PEND PIL ARG1             ARG2            ISR(s)
ffffffff8277a5c0 0    1   ffffffff8048da80 0               errorq_intr
fffffffffbc05ae8 0    1   0                0               
softlevel1            ------------> no pending count.
ffffffff8277a4c0 0    2   ffffffff8048dd00 0               errorq_intr
fffffffffbc00070 0    2   0                0               cbe_low_level
ffffffff83a706c0 0    4   ffffffff90004d18 0               ghd_doneq_process
ffffffff8ed31880 0    4   ffffffff90004d18 0               
ghd_timeout_softintr
ffffffff83946e00 0    4   ffffffff8f046c40 0               power_soft_intr
ffffffff83a70c00 0    4   ffffffff83b1b000 0               bge_chip_factotum
ffffffff83a70cc0 0    4   ffffffff83b1b000 0               bge_reschedule
ffffffff8277a2c0 0    4   0                0               asysoftintr
ffffffff82f4d100 0    9   ffffffff82f76370 0               
e1000g_tx_softint_worker
ffffffff82f4df00 0    9   ffffffff82f86370 0               
e1000g_tx_softint_worker
ffffffff833b3e80 0    9   ffffffff801af7e8 0               hcdi_soft_intr
ffffffff8277a000 0    9   ffffffff801afb68 0               hcdi_soft_intr
fffffffffbc00030 0    10  0                0               cbe_softclock


Actually, at that time, system still had 1 IDLE CPU, I think the system 
shouldn't have so many overdue callout entries.

 > ::cpuinfo -v
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH 
THREAD           PROC
  0 fffffffffbc27730  1f    1    6 169   no    no t-3    
fffffe80000bfc80 sched
                       |    |    | 
            RUNNING <--+    |    +--> PIL THREAD
              READY         |          10 fffffe80000bfc80
           QUIESCED         |           6 fffffe80000b9c80
             EXISTS         |
             ENABLE         +-->  PRI THREAD           PROC
                                   99 fffffe80000d1c80 sched

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH 
THREAD           PROC
  1 fffffffffbc2f260  1b    1    0  -1   no    no t-17   
fffffe8000401c80 (idle)
                       |    |
            RUNNING <--+    +-->  PRI THREAD           PROC
              READY                60 fffffe80044d9c80 sched
             EXISTS
             ENABLE

What else should I check?

-- 
Cheers,

------------------------------------------------------------
Oliver Yang | Work from office | http://blog.csdn.net/yayong

[mdb-discuss] Questions about overdue callout_execute

Reply via email to