Re: [E1000-devel] igb driver sometimes stops responding after dkms build

Ronciak, John Mon, 08 Jun 2015 14:57:53 -0700

We think that maybe you are having link issues since the problem is 
intermittent and moving from machine to machine.  If it more repeatable on a 
since system we could do more.  Please check the link at the switch port to see 
if it think it has link on that failing system.


Thanks.

Cheers,
John

> -----Original Message-----
> From: Fujinaka, Todd [mailto:todd.fujin...@intel.com]
> Sent: Monday, June 8, 2015 2:53 PM
> To: Fujinaka, Todd; Brandon Whaley; e1000-devel@lists.sourceforge.net
> Subject: Re: [E1000-devel] igb driver sometimes stops responding after dkms
> build
> 
> ethregs would be my choice. ethtool -d also gives some information.
> 
> Todd Fujinaka
> Software Application Engineer
> Networking Division (ND)
> Intel Corporation
> todd.fujin...@intel.com
> (503) 712-4565
> 
> -----Original Message-----
> From: Fujinaka, Todd [mailto:todd.fujin...@intel.com]
> Sent: Monday, June 08, 2015 2:42 PM
> To: Brandon Whaley; e1000-devel@lists.sourceforge.net
> Subject: Re: [E1000-devel] igb driver sometimes stops responding after dkms
> build
> 
> It's difficult to troubleshoot a problem that's erratic and hard to 
> replicate. A
> dump of the registers and status from the link partner would be a good idea. I
> wonder if it isn't something triggered on the other side (is it a switch?) 
> that
> sees something odd and disables the port, and a reboot just takes long enough
> that the port comes back?
> 
> Todd Fujinaka
> Software Application Engineer
> Networking Division (ND)
> Intel Corporation
> todd.fujin...@intel.com
> (503) 712-4565
> 
> -----Original Message-----
> From: Brandon Whaley [mailto:redkr...@gmail.com]
> Sent: Monday, June 08, 2015 10:27 AM
> To: e1000-devel@lists.sourceforge.net
> Subject: [E1000-devel] igb driver sometimes stops responding after dkms build
> 
> I use dkms to build the igb driver after new kernel installs on my fleet of
> servers using the following commands after every yum update:
> 
> dkms build -m igb -v 5.2.18
> dkms install -m igb -v 5.2.18
> 
> About once a month, one of my boxes (a different one each time) will stop
> responding after this.  Nothing I do is able to recover network connectivity 
> short
> of a reboot (not loading/unloading the driver, restarting networking, etc.) 
> and
> since these are production machines, it causes some downtime for us.  Below is
> what you see in the syslog when the event occurs:
> 
> Jun  8 12:29:19 localhost kernel: [147419.444969] ------------[ cut here
> ]------------
> Jun  8 12:29:19 localhost kernel: [147419.444978] WARNING: at
> net/sched/sch_generic.c:267 dev_watchdog+0x26b/0x280() (Tainted: G
>   ---------------  T)
> Jun  8 12:29:19 localhost kernel: [147419.444981] Hardware name: X9DRL-3F/iF
> Jun  8 12:29:19 localhost kernel: [147419.444982] NETDEV WATCHDOG: eth0
> (igb): transmit queue 0 timed out
> Jun  8 12:29:19 localhost kernel: [147419.444984] Modules linked in:
> mpt3sas mpt2sas raid_class mptctl mptbase ip6t_rt ipt_addrtype xt_policy
> aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64
> aes_generic cbc kcare(U) vzethdev pio_kaio pio_nfs pio_direct pfmt_raw
> pfmt_ploop1 ploop simfs vziolimit vzdquota ip6t_REJECT xfrm6_mode_tunnel
> xfrm4_mode_tunnel nf_conntrack_netbios_ns nf_conntrack_broadcast
> nf_conntrack_netlink xt_comment nfsd ip6_tunnel ip_vs ipip xt_NFQUEUE
> xt_pkttype ecryptfs(T) ip_gre ip_tunnel ipt_MASQUERADE nf_nat_irc xt_helper
> nf_conntrack_irc nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6t_LOG
> xt_connlimit xt_recent pppoatm atm vzrst vzcpt nfs lockd fscache auth_rpcgss
> nfs_acl sunrpc xfrm4_mode_transport xfrm6_mode_transport ccm authenc
> esp6 ah6 cnic uio xfrm4_tunnel tunnel4 ipcomp6 xfrm6_tunnel tunnel6 ipcomp
> xfrm_ipcomp esp4 ah4 af_key arc4 ecb ppp_mppe ppp_deflate zlib_deflate
> ppp_async ppp_generic slhc crc_ccitt fuse tun xt_MARK xt_mark vzevent
> autofs4 vznetdev vzmon vzdev ipt Jun  8 12:29:19 localhost kernel: _REDIRECT
> xt_owner nf_nat_ftp nf_conntrack_ftp iptable_nat nf_nat xt_state xt_length
> xt_hl xt_tcpmss xt_TCPMSS xt_multiport xt_limit nf_conntrack_ipv4
> nf_conntrack
> nf_defrag_ipv4 ipt_LOG xt_DSCP xt_dscp ipt_REJECT iptable_mangle xt_set
> iptable_filter iptable_raw ip_tables ip6table_mangle ip6table_filter
> ip6table_raw ip6_tables ipv6 ip_set_hash_ip ip_set nfnetlink iTCO_wdt
> iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler acpi_pad
> e1000e(U) ses enclosure sg igb(U) dca i2c_algo_bit ptp pps_core sb_edac
> edac_core
> i2c_i801 i2c_core lpc_ich mfd_core shpchp tcp_htcp ext4 jbd2 mbcache
> sd_mod crc_t10dif isci libsas scsi_transport_sas ahci megaraid_sas wmi
> dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
> Jun  8 12:29:19 localhost kernel: [147419.445094] Pid: 0, comm: swapper
> veid: 0 Tainted: G           ---------------  T 2.6.32-042stab108.2 #1
> Jun  8 12:29:19 localhost kernel: [147419.445096] Call Trace:
> Jun  8 12:29:19 localhost kernel: [147419.445098]  <IRQ>  
> [<ffffffff8107b827>] ?
> warn_slowpath_common+0x87/0xc0 Jun  8 12:29:19 localhost kernel:
> [147419.445109]  [<ffffffff8107b916>] ?
> warn_slowpath_fmt+0x46/0x50
> Jun  8 12:29:19 localhost kernel: [147419.445112]  [<ffffffff8148fccb>] ?
> dev_watchdog+0x26b/0x280
> Jun  8 12:29:19 localhost kernel: [147419.445117]  [<ffffffff81015009>] ?
> sched_clock+0x9/0x10
> Jun  8 12:29:19 localhost kernel: [147419.445123]  [<ffffffff8108f6cc>] ?
> run_timer_softirq+0x1bc/0x380
> Jun  8 12:29:19 localhost kernel: [147419.445126]  [<ffffffff8148fa60>] ?
> dev_watchdog+0x0/0x280
> Jun  8 12:29:19 localhost kernel: [147419.445130]  [<ffffffff81034ddd>] ?
> lapic_next_event+0x1d/0x30
> Jun  8 12:29:19 localhost kernel: [147419.445134]  [<ffffffff81084c7d>] ?
> __do_softirq+0x10d/0x250
> Jun  8 12:29:19 localhost kernel: [147419.445139]  [<ffffffff8100c48c>] ?
> call_softirq+0x1c/0x30
> Jun  8 12:29:19 localhost kernel: [147419.445142]  [<ffffffff810102b5>] ?
> do_softirq+0x65/0xa0
> Jun  8 12:29:19 localhost kernel: [147419.445145]  [<ffffffff81084a9d>] ?
> irq_exit+0xcd/0xd0
> Jun  8 12:29:19 localhost kernel: [147419.445149]  [<ffffffff8153f44a>] ?
> smp_apic_timer_interrupt+0x4a/0x60
> Jun  8 12:29:19 localhost kernel: [147419.445152]  [<ffffffff8100bc93>] ?
> apic_timer_interrupt+0x13/0x20
> Jun  8 12:29:19 localhost kernel: [147419.445154]  <EOI>  
> [<ffffffff812fa20e>] ?
> intel_idle+0xde/0x170 Jun  8 12:29:19 localhost kernel: [147419.445160]
> [<ffffffff812fa1f1>] ?
> intel_idle+0xc1/0x170
> Jun  8 12:29:19 localhost kernel: [147419.445166]  [<ffffffff81435d27>] ?
> cpuidle_idle_call+0xa7/0x140
> Jun  8 12:29:19 localhost kernel: [147419.445170]  [<ffffffff8100a026>] ?
> cpu_idle+0xb6/0x110
> Jun  8 12:29:19 localhost kernel: [147419.445174]  [<ffffffff8152df04>] ?
> start_secondary+0x2be/0x301
> Jun  8 12:29:19 localhost kernel: [147419.445179] ---[ end trace
> 2179b48f00e92658 ]---
> Jun  8 12:29:19 localhost kernel: [147419.445180] Tainting kernel with flag
> 0x9
> Jun  8 12:29:19 localhost kernel: [147419.445182] Pid: 0, comm: swapper
> veid: 0 Tainted: G           ---------------  T 2.6.32-042stab108.2 #1
> Jun  8 12:29:19 localhost kernel: [147419.445184] Call Trace:
> Jun  8 12:29:19 localhost kernel: [147419.445185]  <IRQ>  
> [<ffffffff8107b6b1>] ?
> add_taint+0x71/0x80 Jun  8 12:29:19 localhost kernel: [147419.445190]
> [<ffffffff8107b834>] ?
> warn_slowpath_common+0x94/0xc0
> Jun  8 12:29:19 localhost kernel: [147419.445193]  [<ffffffff8107b916>] ?
> warn_slowpath_fmt+0x46/0x50
> Jun  8 12:29:19 localhost kernel: [147419.445196]  [<ffffffff8148fccb>] ?
> dev_watchdog+0x26b/0x280
> Jun  8 12:29:19 localhost kernel: [147419.445199]  [<ffffffff81015009>] ?
> sched_clock+0x9/0x10
> Jun  8 12:29:19 localhost kernel: [147419.445204]  [<ffffffff8108f6cc>] ?
> run_timer_softirq+0x1bc/0x380
> Jun  8 12:29:19 localhost kernel: [147419.445206]  [<ffffffff8148fa60>] ?
> dev_watchdog+0x0/0x280
> Jun  8 12:29:19 localhost kernel: [147419.445209]  [<ffffffff81034ddd>] ?
> lapic_next_event+0x1d/0x30
> Jun  8 12:29:19 localhost kernel: [147419.445213]  [<ffffffff81084c7d>] ?
> __do_softirq+0x10d/0x250
> Jun  8 12:29:19 localhost kernel: [147419.445217]  [<ffffffff8100c48c>] ?
> call_softirq+0x1c/0x30
> Jun  8 12:29:19 localhost kernel: [147419.445219]  [<ffffffff810102b5>] ?
> do_softirq+0x65/0xa0
> Jun  8 12:29:19 localhost kernel: [147419.445222]  [<ffffffff81084a9d>] ?
> irq_exit+0xcd/0xd0
> Jun  8 12:29:19 localhost kernel: [147419.445225]  [<ffffffff8153f44a>] ?
> smp_apic_timer_interrupt+0x4a/0x60
> Jun  8 12:29:19 localhost kernel: [147419.445227]  [<ffffffff8100bc93>] ?
> apic_timer_interrupt+0x13/0x20
> Jun  8 12:29:19 localhost kernel: [147419.445229]  <EOI>  
> [<ffffffff812fa20e>] ?
> intel_idle+0xde/0x170 Jun  8 12:29:19 localhost kernel: [147419.445233]
> [<ffffffff812fa1f1>] ?
> intel_idle+0xc1/0x170
> Jun  8 12:29:19 localhost kernel: [147419.445238]  [<ffffffff81435d27>] ?
> cpuidle_idle_call+0xa7/0x140
> Jun  8 12:29:19 localhost kernel: [147419.445241]  [<ffffffff8100a026>] ?
> cpu_idle+0xb6/0x110
> Jun  8 12:29:19 localhost kernel: [147419.445243]  [<ffffffff8152df04>] ?
> start_secondary+0x2be/0x301
> Jun  8 12:29:27 localhost kernel: [147427.448304] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:29:45 localhost kernel: [147445.478853] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:29:59 localhost kernel: [147459.496009] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:30:13 localhost kernel: [147473.459481] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:30:27 localhost kernel: [147487.460716] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:30:45 localhost kernel: [147505.498234] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:31:03 localhost kernel: [147523.513873] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
> 
> Is there any more information I can provide the next time this happens?  If
> trends continue I should see it again in 3-5 weeks and will be able to collect
> necessary info then.  Unfortunately I've yet to find a way to replicate this 
> on
> demand.
> ------------------------------------------------------------------------------
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit
> http://communities.intel.com/community/wired
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit
> http://communities.intel.com/community/wired

------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Re: [E1000-devel] igb driver sometimes stops responding after dkms build

Reply via email to