On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote: > > [...] > > [308309.574551] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status > c 2b 0 80ff > > Rx and Tx are enabled. > > Instant (untested) hack below.
Thanks; I'll try that. In fact since updating to 4.2 the problem has got worse — now the whole machine dies: [ 232.064630] ------------[ cut here ]------------ [ 232.069282] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x1e5/0x200() [ 232.077840] NETDEV WATCHDOG: eth1 (8139cp): transmit queue 0 timed out [ 232.084380] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586 [ 232.157787] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-gx+ #25 [ 232.163982] 00000000 c10313eb dee95000 00000000 ffffff32 00000258 c1031446 00000009 [ 232.171988] dec0bf74 c13c3afc dec0bf8c c1272ef5 c13bfe82 0000012f c13c3afc dee95000 [ 232.179978] e04cfd3c 00000000 dee95000 dee95240 00000258 80000100 c1272d10 dee95000 [ 232.188012] Call Trace: [ 232.190482] [<c10313eb>] ? warn_slowpath_common+0x5b/0x90 [ 232.196063] [<c1031446>] ? warn_slowpath_fmt+0x26/0x30 [ 232.201307] [<c1272ef5>] ? dev_watchdog+0x1e5/0x200 [ 232.206317] [<c1272d10>] ? qdisc_rcu_free+0x30/0x30 [ 232.211307] [<c105b19e>] ? call_timer_fn.isra.7+0xe/0x60 [ 232.216811] [<c1272d10>] ? qdisc_rcu_free+0x30/0x30 [ 232.221794] [<c105b33d>] ? run_timer_softirq+0xfd/0x1b0 [ 232.227221] [<c1033777>] ? __do_softirq+0xa7/0x190 [ 232.232117] [<c10336d0>] ? __hrtimer_tasklet_trampoline+0x20/0x20 [ 232.238395] [<c100336b>] ? do_softirq_own_stack+0x1b/0x20 [ 232.243881] <IRQ> [<c1003165>] ? do_IRQ+0x35/0xa0 [ 232.248904] [<c131e2a9>] ? common_interrupt+0x29/0x30 [ 232.254141] [<c104007b>] ? put_unbound_pool+0x17b/0x1a0 [ 232.259470] [<c1008bf2>] ? default_idle+0x2/0x10 [ 232.264213] [<c1009106>] ? arch_cpu_idle+0x6/0x10 [ 232.269026] [<c104eee5>] ? cpu_startup_entry+0xf5/0x190 [ 232.274459] [<c142a9b9>] ? start_kernel+0x2e5/0x2e8 [ 232.279432] ---[ end trace 30ae4e701c36b431 ]--- [ 232.284167] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status c 2b 1 80ac [ 260.106382] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper:0] [ 260.113515] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 aes_i586 [ 260.116369] CPU: 0 PID: 0 Comm: swapper Tainted: G W 4.2.0-gx+ #25 [ 260.116369] task: c13f7540 ti: c13f0000 task.ti: c13f0000 [ 260.116369] EIP: 0060:[<c131d6fa>] EFLAGS: 00200292 CPU: 0 [ 260.116369] EIP is at _raw_spin_unlock_irqrestore+0xa/0x10 [ 260.116369] EAX: 00000000 EBX: 00200292 ECX: e04d8100 EDX: 00200292 [ 260.116369] ESI: 00000000 EDI: ffffff32 EBP: 00000258 ESP: dec0bf90 [ 260.116369] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 260.116369] CR0: 8005003b CR2: 098c4c50 CR3: 1ef88000 CR4: 00000090 [ 260.116369] Stack: [ 260.116369] dee95000 c1272ecb dee95000 dee95240 00000258 80000100 c1272d10 dee95000 [ 260.116369] 00000100 c105b19e 00000410 c1272d10 c105b33d 00000000 c146d524 00000001 [ 260.116369] 00000002 c1033777 00000002 ffffe577 c13f2000 0000000a 00200000 c13f1f70 [ 260.116369] Call Trace: [ 260.116369] [<c1272ecb>] ? dev_watchdog+0x1bb/0x200 [ 260.116369] [<c1272d10>] ? qdisc_rcu_free+0x30/0x30 [ 260.116369] [<c105b19e>] ? call_timer_fn.isra.7+0xe/0x60 [ 260.116369] [<c1272d10>] ? qdisc_rcu_free+0x30/0x30 [ 260.116369] [<c105b33d>] ? run_timer_softirq+0xfd/0x1b0 [ 260.116369] [<c1033777>] ? __do_softirq+0xa7/0x190 [ 260.116369] [<c10336d0>] ? __hrtimer_tasklet_trampoline+0x20/0x20 [ 260.116369] [<c100336b>] ? do_softirq_own_stack+0x1b/0x20 [ 260.116369] <IRQ> [ 260.116369] [<c1003165>] ? do_IRQ+0x35/0xa0 [ 260.116369] [<c131e2a9>] ? common_interrupt+0x29/0x30 [ 260.116369] [<c104007b>] ? put_unbound_pool+0x17b/0x1a0 [ 260.116369] [<c1008bf2>] ? default_idle+0x2/0x10 [ 260.116369] [<c1009106>] ? arch_cpu_idle+0x6/0x10 [ 260.116369] [<c104eee5>] ? cpu_startup_entry+0xf5/0x190 [ 260.116369] [<c142a9b9>] ? start_kernel+0x2e5/0x2e8 [ 260.116369] Code: 00 b8 01 00 00 00 c3 8d 76 00 8d bc 27 00 00 00 00 e8 db 2e d3 ff c3 8d 76 00 8d bc 27 00 00 00 00 53 89 d3 e8 c8 2e d3 ff 53 9d <5b> c3 8d 74 26 00 e8 bb 2e d3 ff fb c3 89 f6 8d bc 27 00 00 00 At this point even sysrq on the serial console isn't working; I'm going to have to go and physically reset it. (gdb) list *dev_watchdog+0x1bb 0xc1272ecb is in dev_watchdog (net/sched/sch_generic.c:304). 299 } 300 301 if (some_queue_timedout) { 302 WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n", 303 dev->name, netdev_drivername(dev), i); 304 dev->netdev_ops ->ndo_tx_timeout(dev); 305 } 306 if (!mod_timer(&dev->watchdog_timer, 307 round_jiffies(jiffies + 308 dev ->watchdog_timeo))) So that _raw_spin_unlock_irqrestore() is going to be a tailcall from the end of cp_tx_timeout(). Moderately confused that it dies on *unlock*. _raw_spin_lock_irqrestore+0xa is the instruction after a 'popf', which makes me wonder if it dies in an IRQ storm. Although then *some* of the NMI watchdog invocations would surely show an IRQ on the stack, but they don't; they're all right on the same instruction. -- David Woodhouse Open Source Technology Centre david.woodho...@intel.com Intel Corporation
smime.p7s
Description: S/MIME cryptographic signature