On Mon, 2015-09-14 at 23:59 +0200, Francois Romieu wrote:
> 
> [...]
> > [308309.574551] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status 
> c   2b    0 80ff
> 
> Rx and Tx are enabled.
> 
> Instant (untested) hack below.

Thanks; I'll try that. In fact since updating to 4.2 the problem has
got worse — now the whole machine dies:

[  232.064630] ------------[ cut here ]------------                             
[  232.069282] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 
dev_watchdog+0x1e5/0x200()                                                      
           
[  232.077840] NETDEV WATCHDOG: eth1 (8139cp): transmit queue 0 timed out       
[  232.084380] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe 
nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE 
xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit 
xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox 
ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp 
nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack 
iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit 
act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc 
sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 
nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter 
ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 
aes_i586                                                               
[  232.157787] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-gx+ #25            
[  232.163982]  00000000 c10313eb dee95000 00000000 ffffff32 00000258 c1031446 
00000009                                                                        
 
[  232.171988]  dec0bf74 c13c3afc dec0bf8c c1272ef5 c13bfe82 0000012f c13c3afc 
dee95000                                                                        
 
[  232.179978]  e04cfd3c 00000000 dee95000 dee95240 00000258 80000100 c1272d10 
dee95000                                                                        
 
[  232.188012] Call Trace:                                                      
[  232.190482]  [<c10313eb>] ? warn_slowpath_common+0x5b/0x90                   
[  232.196063]  [<c1031446>] ? warn_slowpath_fmt+0x26/0x30                      
[  232.201307]  [<c1272ef5>] ? dev_watchdog+0x1e5/0x200                         
[  232.206317]  [<c1272d10>] ? qdisc_rcu_free+0x30/0x30                         
[  232.211307]  [<c105b19e>] ? call_timer_fn.isra.7+0xe/0x60                    
[  232.216811]  [<c1272d10>] ? qdisc_rcu_free+0x30/0x30                         
[  232.221794]  [<c105b33d>] ? run_timer_softirq+0xfd/0x1b0                     
[  232.227221]  [<c1033777>] ? __do_softirq+0xa7/0x190                          
[  232.232117]  [<c10336d0>] ? __hrtimer_tasklet_trampoline+0x20/0x20           
[  232.238395]  [<c100336b>] ? do_softirq_own_stack+0x1b/0x20                   
[  232.243881]  <IRQ>  [<c1003165>] ? do_IRQ+0x35/0xa0                          
[  232.248904]  [<c131e2a9>] ? common_interrupt+0x29/0x30                       
[  232.254141]  [<c104007b>] ? put_unbound_pool+0x17b/0x1a0                     
[  232.259470]  [<c1008bf2>] ? default_idle+0x2/0x10                            
[  232.264213]  [<c1009106>] ? arch_cpu_idle+0x6/0x10                           
[  232.269026]  [<c104eee5>] ? cpu_startup_entry+0xf5/0x190                     
[  232.274459]  [<c142a9b9>] ? start_kernel+0x2e5/0x2e8                         
[  232.279432] ---[ end trace 30ae4e701c36b431 ]---                             
[  232.284167] 8139cp 0000:00:0b.0 eth1: Transmit timeout, status  c   2b    1 
80ac                                                                            
 
[  260.106382] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper:0]
[  260.113515] Modules linked in: sch_teql 8139cp mii iptable_nat pppoe 
nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE 
xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit 
xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT solos_pci pppox 
ppp_async nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_nat_ftp 
nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack 
iptable_raw iptable_mangle iptable_filter ip_tables crc_ccitt act_skbedit 
act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc 
sch_ingress ledtrig_heartbeat ledtrig_gpio ip6t_REJECT nf_reject_ipv6 
nf_log_ipv6 nf_log_common ip6table_raw ip6table_mangle ip6table_filter 
ip6_tables x_tables pppoatm ppp_generic slhc br2684 atm geode_aes cbc arc4 
aes_i586                                                               
[  260.116369] CPU: 0 PID: 0 Comm: swapper Tainted: G        W       4.2.0-gx+ 
#25                                                                             
 
[  260.116369] task: c13f7540 ti: c13f0000 task.ti: c13f0000                    
[  260.116369] EIP: 0060:[<c131d6fa>] EFLAGS: 00200292 CPU: 0                   
[  260.116369] EIP is at _raw_spin_unlock_irqrestore+0xa/0x10                   
[  260.116369] EAX: 00000000 EBX: 00200292 ECX: e04d8100 EDX: 00200292          
[  260.116369] ESI: 00000000 EDI: ffffff32 EBP: 00000258 ESP: dec0bf90          
[  260.116369]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068                    
[  260.116369] CR0: 8005003b CR2: 098c4c50 CR3: 1ef88000 CR4: 00000090          
[  260.116369] Stack:                                                           
[  260.116369]  dee95000 c1272ecb dee95000 dee95240 00000258 80000100 c1272d10 
dee95000                                                                        
 
[  260.116369]  00000100 c105b19e 00000410 c1272d10 c105b33d 00000000 c146d524 
00000001                                                                        
 
[  260.116369]  00000002 c1033777 00000002 ffffe577 c13f2000 0000000a 00200000 
c13f1f70                                                                        
 
[  260.116369] Call Trace:                                                      
[  260.116369]  [<c1272ecb>] ? dev_watchdog+0x1bb/0x200                         
[  260.116369]  [<c1272d10>] ? qdisc_rcu_free+0x30/0x30                         
[  260.116369]  [<c105b19e>] ? call_timer_fn.isra.7+0xe/0x60                    
[  260.116369]  [<c1272d10>] ? qdisc_rcu_free+0x30/0x30                         
[  260.116369]  [<c105b33d>] ? run_timer_softirq+0xfd/0x1b0                     
[  260.116369]  [<c1033777>] ? __do_softirq+0xa7/0x190                          
[  260.116369]  [<c10336d0>] ? __hrtimer_tasklet_trampoline+0x20/0x20           
[  260.116369]  [<c100336b>] ? do_softirq_own_stack+0x1b/0x20                   
[  260.116369]  <IRQ>                                                           
[  260.116369]  [<c1003165>] ? do_IRQ+0x35/0xa0                                 
[  260.116369]  [<c131e2a9>] ? common_interrupt+0x29/0x30                       
[  260.116369]  [<c104007b>] ? put_unbound_pool+0x17b/0x1a0                     
[  260.116369]  [<c1008bf2>] ? default_idle+0x2/0x10                            
[  260.116369]  [<c1009106>] ? arch_cpu_idle+0x6/0x10                           
[  260.116369]  [<c104eee5>] ? cpu_startup_entry+0xf5/0x190                     
[  260.116369]  [<c142a9b9>] ? start_kernel+0x2e5/0x2e8                         
[  260.116369] Code: 00 b8 01 00 00 00 c3 8d 76 00 8d bc 27 00 00 00 00 e8 db 
2e d3 ff c3 8d 76 00 8d bc 27 00 00 00 00 53 89 d3 e8 c8 2e d3 ff 53 9d <5b> c3 
8d 74 26 00 e8 bb 2e d3 ff fb c3 89 f6 8d bc 27 00 00 00                        
  

At this point even sysrq on the serial console isn't working; I'm
going to have to go and physically reset it.

(gdb) list *dev_watchdog+0x1bb
0xc1272ecb is in dev_watchdog (net/sched/sch_generic.c:304).
299                             }
300     
301                             if (some_queue_timedout) {
302                                     WARN_ONCE(1, KERN_INFO
"NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
303                                            dev->name,
netdev_drivername(dev), i);
304                                     dev->netdev_ops
->ndo_tx_timeout(dev);
305                             }
306                             if (!mod_timer(&dev->watchdog_timer,
307                                            round_jiffies(jiffies
+
308                                                          dev
->watchdog_timeo)))

So that _raw_spin_unlock_irqrestore() is going to be a tailcall from
the end of cp_tx_timeout(). Moderately confused that it dies on *unlock*.

_raw_spin_lock_irqrestore+0xa is the instruction after a 'popf', which
makes me wonder if it dies in an IRQ storm. Although then *some* of the
NMI watchdog invocations would surely show an IRQ on the stack, but
they
don't; they're all right on the same instruction.


-- 
David Woodhouse                            Open Source Technology Centre
david.woodho...@intel.com                              Intel Corporation

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to