#2362: soft lockup: spin lock conflict in madwifi r4100-20090929 -------------------------------+-------------------------------------------- Reporter: anonymous | Type: defect Status: new | Priority: major Component: madwifi: other | Version: v0.9.4 Keywords: r4100 soft lockup | Patch_attached: 0 Pending: 0 | -------------------------------+-------------------------------------------- We've found a soft lockup issue with madwifi-r4100-20090929 when running hostapd. Kernel is 2.6.31.6, cards involved are Ubiquiti SR2 (AR5212). It occurs when domlme() triggers a transmit, the TX queue is locked, an interrupt occurs, the TX tasklet tries to lock the TX queue and a soft lockup occurs. It occurs randomly with a higher occurence under load, probably due to the increased chances of a domlme() triggered management frame being transmitted at the time the TX tasklet runs.
We have crash dumps from a number of AP's, but the backtrace is identical in all cases: {{{ [19780.472004] BUG: soft lockup - CPU#0 stuck for 61s! [hostapd:2089] [19780.472004] Modules linked in: wlan_ccmp(F) wlan_tkip(F) wlan_wep(F) wlan_xa [19780.472004] [19780.472004] Pid: 2089, comm: hostapd Tainted: PF (2.6.31.6 #3) [19780.472004] EIP: 0060:[<c174562e>] EFLAGS: 00000297 CPU: 0 [19780.472004] EIP is at _spin_lock+0x15/0x19 [19780.472004] EAX: df9a7828 EBX: 00000002 ECX: df9a7818 EDX: 00004a49 [19780.472004] ESI: df9a6340 EDI: df9a6340 EBP: dd275bf8 ESP: dd275bf8 [19780.472004] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [19780.472004] CR0: 8005003b CR2: b77bc000 CR3: 1d223000 CR4: 000006d0 [19780.472004] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [19780.472004] DR6: ffff0ff0 DR7: 00000400 [19780.472004] Call Trace: [19780.472004] [<e0f2feab>] ath_tx_processq+0x98/0x61d [ath_pci] [19780.472004] [<c104da73>] ? sched_clock_cpu+0x128/0x132 [19780.472004] [<c10d2833>] ? __getblk+0x1d/0x3e [19780.472004] [<e0f306d8>] ath_tx_tasklet+0x5a/0xd8 [ath_pci] [19780.472004] [<c103c2d9>] tasklet_action+0x5d/0x9a [19780.472004] [<c103bff4>] __do_softirq+0x9f/0x140 [19780.472004] [<c103c13c>] irq_exit+0x2d/0x60 [19780.472004] [<c10045d8>] do_IRQ+0x7d/0x93 [19780.472004] [<c1003049>] common_interrupt+0x29/0x30 [19780.472004] [<e0f2fd99>] ? ath_tx_start+0xcc8/0xd42 [ath_pci] [19780.472004] [<e0f2a679>] ath_mgtstart+0x1b9/0x226 [ath_pci] [19780.472004] [<e0e2ce3b>] ieee80211_mgmt_output+0x13a/0x142 [wlan] [19780.472004] [<e0e2f146>] ieee80211_send_mgmt+0xa42/0xa7c [wlan] [19780.472004] [<e0e36a4f>] domlme+0x2d/0x38 [wlan] [19780.472004] [<e0e36c22>] ieee80211_ioctl_setmlme+0x17d/0x22e [wlan] [19780.472004] [<c16b88f2>] ioctl_private_iw_point+0xa0/0xf1 [19780.472004] [<c16b880c>] ? get_priv_descr_and_size+0x6a/0xb0 [19780.472004] [<c16b8987>] ioctl_private_call+0x44/0x5b [19780.472004] [<e0e36aa5>] ? ieee80211_ioctl_setmlme+0x0/0x22e [wlan] [19780.472004] [<c16b8a2e>] wireless_process_ioctl+0x90/0xb5 [19780.472004] [<e0e36aa5>] ? ieee80211_ioctl_setmlme+0x0/0x22e [wlan] [19780.472004] [<c16b8abe>] wext_ioctl_dispatch+0x3f/0x50 [19780.472004] [<c16b8705>] ? ioctl_standard_call+0x0/0x9d [19780.472004] [<c16b8943>] ? ioctl_private_call+0x0/0x5b [19780.472004] [<c16b8af5>] wext_handle_ioctl+0x26/0x58 [19780.472004] [<c16b8705>] ? ioctl_standard_call+0x0/0x9d [19780.472004] [<c16b8943>] ? ioctl_private_call+0x0/0x5b [19780.472004] [<c15fe9d0>] dev_ioctl+0x214/0x220 [19780.472004] [<c15efeb8>] ? sock_ioctl+0x0/0x1ee [19780.472004] [<c15f009a>] sock_ioctl+0x1e2/0x1ee [19780.472004] [<c15efeb8>] ? sock_ioctl+0x0/0x1ee [19780.472004] [<c10c1a57>] vfs_ioctl+0x27/0x68 [19780.472004] [<c10c24d3>] do_vfs_ioctl+0x174/0x17f [19780.472004] [<c10c251d>] sys_ioctl+0x3f/0x5a [19780.472004] [<c1002a65>] syscall_call+0x7/0xb [19780.472004] Kernel panic - not syncing: softlockup: hung tasks [19780.472004] Pid: 2089, comm: hostapd Tainted: PF 2.6.31.6 #3 [19780.472004] Call Trace: [19780.472004] [<c1036cf3>] panic+0x38/0xe2 [19780.472004] [<c1072a24>] softlockup_tick+0x13d/0x145 [19780.472004] [<c1040123>] run_local_timers+0x17/0x19 [19780.472004] [<c103ff76>] update_process_times+0x24/0x4e [19780.472004] [<c1055239>] tick_sched_timer+0x6a/0x98 [19780.472004] [<c104c333>] __run_hrtimer+0x5f/0x8c [19780.472004] [<c104c45b>] hrtimer_interrupt+0xfb/0x146 [19780.472004] [<c101a381>] local_apic_timer_interrupt+0x42/0x47 [19780.472004] [<c101a3b4>] smp_apic_timer_interrupt+0x2e/0x3d [19780.472004] [<c1003316>] apic_timer_interrupt+0x2a/0x30 [19780.472004] [<c174562e>] ? _spin_lock+0x15/0x19 [19780.472004] [<e0f2feab>] ath_tx_processq+0x98/0x61d [ath_pci] [19780.472004] [<c104da73>] ? sched_clock_cpu+0x128/0x132 [19780.472004] [<c10d2833>] ? __getblk+0x1d/0x3e [19780.472004] [<e0f306d8>] ath_tx_tasklet+0x5a/0xd8 [ath_pci] [19780.472004] [<c103c2d9>] tasklet_action+0x5d/0x9a [19780.472004] [<c103bff4>] __do_softirq+0x9f/0x140 [19780.472004] [<c103c0bb>] do_softirq+0x26/0x2b [19780.472004] [<c103c13c>] irq_exit+0x2d/0x60 [19780.472004] [<c10045d8>] do_IRQ+0x7d/0x93 [19780.472004] [<c1003049>] common_interrupt+0x29/0x30 [19780.472004] [<e0f2fd99>] ? ath_tx_start+0xcc8/0xd42 [ath_pci] [19780.472004] [<e0f2a679>] ath_mgtstart+0x1b9/0x226 [ath_pci] [19780.472004] [<e0e2ce3b>] ieee80211_mgmt_output+0x13a/0x142 [wlan] [19780.472004] [<e0e2f146>] ieee80211_send_mgmt+0xa42/0xa7c [wlan] [19780.472004] [<e0e36a4f>] domlme+0x2d/0x38 [wlan] [19780.472004] [<e0e36c22>] ieee80211_ioctl_setmlme+0x17d/0x22e [wlan] [19780.472004] [<c16b88f2>] ioctl_private_iw_point+0xa0/0xf1 [19780.472004] [<c16b880c>] ? get_priv_descr_and_size+0x6a/0xb0 [19780.472004] [<c16b8987>] ioctl_private_call+0x44/0x5b [19780.472004] [<e0e36aa5>] ? ieee80211_ioctl_setmlme+0x0/0x22e [wlan] [19780.472004] [<c16b8a2e>] wireless_process_ioctl+0x90/0xb5 [19780.472004] [<e0e36aa5>] ? ieee80211_ioctl_setmlme+0x0/0x22e [wlan] [19780.472004] [<c16b8abe>] wext_ioctl_dispatch+0x3f/0x50 [19780.472004] [<c16b8705>] ? ioctl_standard_call+0x0/0x9d [19780.472004] [<c16b8943>] ? ioctl_private_call+0x0/0x5b [19780.472004] [<c16b8af5>] wext_handle_ioctl+0x26/0x58 [19780.472004] [<c16b8705>] ? ioctl_standard_call+0x0/0x9d [19780.472004] [<c16b8943>] ? ioctl_private_call+0x0/0x5b [19780.472004] [<c15fe9d0>] dev_ioctl+0x214/0x220 [19780.472004] [<c15efeb8>] ? sock_ioctl+0x0/0x1ee [19780.472004] [<c15f009a>] sock_ioctl+0x1e2/0x1ee [19780.472004] [<c15efeb8>] ? sock_ioctl+0x0/0x1ee [19780.472004] [<c10c1a57>] vfs_ioctl+0x27/0x68 [19780.472004] [<c10c24d3>] do_vfs_ioctl+0x174/0x17f [19780.472004] [<c10c251d>] sys_ioctl+0x3f/0x5a [19780.472004] [<c1002a65>] syscall_call+0x7/0xb [19780.472004] Rebooting in 10 seconds.. }}} ath_tx_processq+0x98 resolves to the line marked below in ath/if_ath.c -> ath_tx_processq(). It occurs while trying to obtain the txq spinlock: {{{ for (;;) { if (uapsdq) ATH_TXQ_UAPSDQ_LOCK_IRQ(txq); else ATH_TXQ_LOCK(txq); // ******* THIS LINE ******** txq->axq_intrcnt = 0; /* reset periodic desc intr count */ bf = STAILQ_FIRST(&txq->axq_q); }}} We seem to hit it approximately once every 2-6 hours on AP's under heavy load. -- Ticket URL: <http://madwifi-project.org/ticket/2362> madwifi-project.org <http://madwifi-project.org/> Multiband Atheros Driver for Wireless Fidelity ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Madwifi-tickets mailing list Madwifi-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/madwifi-tickets