Hi All, I have several questions about overdue callout_execute
1. How does the system handle a overdue callout_execute? 2. As far as I know, setrun entry in callout table will be handled as a softcall, which is a software interrupt(softlevel1) with PIL 1, right? 3. If we found there are so many overdue setrun entries in callout table, what we should check by using mdb? Here are background information with my other questions: I had encountered a system hang issue in my test env and I forced a crash dump file successfully. In this crash dump file, I found a e1000g driver thread blocked on cv_timedwait, but we supposed it should return after 1 tick, but it had never returned. > ::callout ! grep fffffe80044e5c80 setrun fffffe80044e5c80 3ffffffffffe1a80 1036d (T-33730) ---> it was overdue about 33730 ticks. stack pointer for thread fffffe80044e5c80: fffffe80044e5880 [ fffffe80044e5880 _resume_from_idle+0xf8() ] fffffe80044e58c0 swtch+0x167() fffffe80044e5930 cv_timedwait+0xcf(ffffffff82f76390, ffffffff82f76388, 1036d) fffffe80044e59c0 cv_timedwait_sig+0x2cc(ffffffff82f76390, ffffffff82f76388, 1036d) fffffe80044e5a70 e1000g_send+0x136(ffffffff82f76370, ffffffffac2fce40) fffffe80044e5ab0 e1000g_m_tx+0x6f(ffffffff82f76000, ffffffffa21f8180) fffffe80044e5ad0 dls_tx+0x1d(ffffffff82f2ec80, ffffffffa21f8180) fffffe80044e5b20 dld_wsrv+0xcc(ffffffff894acb70) fffffe80044e5b50 runservice+0x42(ffffffff894acb70) fffffe80044e5b80 queue_service+0x42(ffffffff894acb70) fffffe80044e5bc0 stream_service+0x73(ffffffff83905740) fffffe80044e5c60 taskq_d_thread+0xbb(ffffffff833af820) fffffe80044e5c70 thread_start+8() After I check the callout table, I found the relevant callout entry for this thread, and it was overdue for some reason: > ::callout ! grep fffffe80044e5c80 setrun fffffe80044e5c80 3ffffffffffe1a80 1036d (T-33730) --------> This indicate it is overdue. I also find about 2573 overdue entries in callout table, it was really a big number: > ::callout FUNCTION ARGUMENT ID TIME sigalarm2proc ffffffff9569aae0 7fffffffffffc010 144a1 (T-17038) -----> This indicated the it was overdue. sigalarm2proc ffffffff91bb7510 7fffffffffffe010 14484 (T-17067) sigalarm2proc ffffffff9569c380 7fffffffffffc020 144a1 (T-17038) sigalarm2proc ffffffff95428d48 7fffffffffffc030 144a1 (T-17038) sigalarm2proc ffffffff91bb8db0 7fffffffffffe030 14483 (T-17068) sigalarm2proc ffffffff9542b238 7fffffffffffc040 144a1 (T-17038) .....................[snipped].................................................................... > ::callout ! grep "T-" | wc -l 2573 Why we could find so many overdue entries in callout table? Since the setrun entry will be processed by softcall, I tried to print the global softcall list by softhead. And we did find there are two callout_execute was pending to handled by softint: > *softhead::list softcall_t sc_next|::print softcall_t { sc_func = callout_execute sc_arg = 0xffffffff80219000 sc_next = softcalls+0x1290 } { sc_func = callout_execute sc_arg = 0xffffffff80216000 sc_next = 0 } > softcall_state/J softcall_state: softcall_state: 2 #define SOFT_PEND 0x02 /* softcall list needs processing */ But when I checked the softint by following command, I found no pending count in softlevel1 line, why? I think the softlevel1 should call into softint(), then, the callout_execute would be executed. But it seemed it had never happened, who can give me some ideas about it? > ::softint ADDR PEND PIL ARG1 ARG2 ISR(s) ffffffff8277a5c0 0 1 ffffffff8048da80 0 errorq_intr fffffffffbc05ae8 0 1 0 0 softlevel1 ------------> no pending count. ffffffff8277a4c0 0 2 ffffffff8048dd00 0 errorq_intr fffffffffbc00070 0 2 0 0 cbe_low_level ffffffff83a706c0 0 4 ffffffff90004d18 0 ghd_doneq_process ffffffff8ed31880 0 4 ffffffff90004d18 0 ghd_timeout_softintr ffffffff83946e00 0 4 ffffffff8f046c40 0 power_soft_intr ffffffff83a70c00 0 4 ffffffff83b1b000 0 bge_chip_factotum ffffffff83a70cc0 0 4 ffffffff83b1b000 0 bge_reschedule ffffffff8277a2c0 0 4 0 0 asysoftintr ffffffff82f4d100 0 9 ffffffff82f76370 0 e1000g_tx_softint_worker ffffffff82f4df00 0 9 ffffffff82f86370 0 e1000g_tx_softint_worker ffffffff833b3e80 0 9 ffffffff801af7e8 0 hcdi_soft_intr ffffffff8277a000 0 9 ffffffff801afb68 0 hcdi_soft_intr fffffffffbc00030 0 10 0 0 cbe_softclock Actually, at that time, system still had 1 IDLE CPU, I think the system shouldn't have so many overdue callout entries. > ::cpuinfo -v ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 0 fffffffffbc27730 1f 1 6 169 no no t-3 fffffe80000bfc80 sched | | | RUNNING <--+ | +--> PIL THREAD READY | 10 fffffe80000bfc80 QUIESCED | 6 fffffe80000b9c80 EXISTS | ENABLE +--> PRI THREAD PROC 99 fffffe80000d1c80 sched ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 1 fffffffffbc2f260 1b 1 0 -1 no no t-17 fffffe8000401c80 (idle) | | RUNNING <--+ +--> PRI THREAD PROC READY 60 fffffe80044d9c80 sched EXISTS ENABLE What else should I check? -- Cheers, ------------------------------------------------------------ Oliver Yang | Work from office | http://blog.csdn.net/yayong