We think that maybe you are having link issues since the problem is intermittent and moving from machine to machine. If it more repeatable on a since system we could do more. Please check the link at the switch port to see if it think it has link on that failing system.
Thanks. Cheers, John > -----Original Message----- > From: Fujinaka, Todd [mailto:todd.fujin...@intel.com] > Sent: Monday, June 8, 2015 2:53 PM > To: Fujinaka, Todd; Brandon Whaley; e1000-devel@lists.sourceforge.net > Subject: Re: [E1000-devel] igb driver sometimes stops responding after dkms > build > > ethregs would be my choice. ethtool -d also gives some information. > > Todd Fujinaka > Software Application Engineer > Networking Division (ND) > Intel Corporation > todd.fujin...@intel.com > (503) 712-4565 > > -----Original Message----- > From: Fujinaka, Todd [mailto:todd.fujin...@intel.com] > Sent: Monday, June 08, 2015 2:42 PM > To: Brandon Whaley; e1000-devel@lists.sourceforge.net > Subject: Re: [E1000-devel] igb driver sometimes stops responding after dkms > build > > It's difficult to troubleshoot a problem that's erratic and hard to > replicate. A > dump of the registers and status from the link partner would be a good idea. I > wonder if it isn't something triggered on the other side (is it a switch?) > that > sees something odd and disables the port, and a reboot just takes long enough > that the port comes back? > > Todd Fujinaka > Software Application Engineer > Networking Division (ND) > Intel Corporation > todd.fujin...@intel.com > (503) 712-4565 > > -----Original Message----- > From: Brandon Whaley [mailto:redkr...@gmail.com] > Sent: Monday, June 08, 2015 10:27 AM > To: e1000-devel@lists.sourceforge.net > Subject: [E1000-devel] igb driver sometimes stops responding after dkms build > > I use dkms to build the igb driver after new kernel installs on my fleet of > servers using the following commands after every yum update: > > dkms build -m igb -v 5.2.18 > dkms install -m igb -v 5.2.18 > > About once a month, one of my boxes (a different one each time) will stop > responding after this. Nothing I do is able to recover network connectivity > short > of a reboot (not loading/unloading the driver, restarting networking, etc.) > and > since these are production machines, it causes some downtime for us. Below is > what you see in the syslog when the event occurs: > > Jun 8 12:29:19 localhost kernel: [147419.444969] ------------[ cut here > ]------------ > Jun 8 12:29:19 localhost kernel: [147419.444978] WARNING: at > net/sched/sch_generic.c:267 dev_watchdog+0x26b/0x280() (Tainted: G > --------------- T) > Jun 8 12:29:19 localhost kernel: [147419.444981] Hardware name: X9DRL-3F/iF > Jun 8 12:29:19 localhost kernel: [147419.444982] NETDEV WATCHDOG: eth0 > (igb): transmit queue 0 timed out > Jun 8 12:29:19 localhost kernel: [147419.444984] Modules linked in: > mpt3sas mpt2sas raid_class mptctl mptbase ip6t_rt ipt_addrtype xt_policy > aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 > aes_generic cbc kcare(U) vzethdev pio_kaio pio_nfs pio_direct pfmt_raw > pfmt_ploop1 ploop simfs vziolimit vzdquota ip6t_REJECT xfrm6_mode_tunnel > xfrm4_mode_tunnel nf_conntrack_netbios_ns nf_conntrack_broadcast > nf_conntrack_netlink xt_comment nfsd ip6_tunnel ip_vs ipip xt_NFQUEUE > xt_pkttype ecryptfs(T) ip_gre ip_tunnel ipt_MASQUERADE nf_nat_irc xt_helper > nf_conntrack_irc nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6t_LOG > xt_connlimit xt_recent pppoatm atm vzrst vzcpt nfs lockd fscache auth_rpcgss > nfs_acl sunrpc xfrm4_mode_transport xfrm6_mode_transport ccm authenc > esp6 ah6 cnic uio xfrm4_tunnel tunnel4 ipcomp6 xfrm6_tunnel tunnel6 ipcomp > xfrm_ipcomp esp4 ah4 af_key arc4 ecb ppp_mppe ppp_deflate zlib_deflate > ppp_async ppp_generic slhc crc_ccitt fuse tun xt_MARK xt_mark vzevent > autofs4 vznetdev vzmon vzdev ipt Jun 8 12:29:19 localhost kernel: _REDIRECT > xt_owner nf_nat_ftp nf_conntrack_ftp iptable_nat nf_nat xt_state xt_length > xt_hl xt_tcpmss xt_TCPMSS xt_multiport xt_limit nf_conntrack_ipv4 > nf_conntrack > nf_defrag_ipv4 ipt_LOG xt_DSCP xt_dscp ipt_REJECT iptable_mangle xt_set > iptable_filter iptable_raw ip_tables ip6table_mangle ip6table_filter > ip6table_raw ip6_tables ipv6 ip_set_hash_ip ip_set nfnetlink iTCO_wdt > iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler acpi_pad > e1000e(U) ses enclosure sg igb(U) dca i2c_algo_bit ptp pps_core sb_edac > edac_core > i2c_i801 i2c_core lpc_ich mfd_core shpchp tcp_htcp ext4 jbd2 mbcache > sd_mod crc_t10dif isci libsas scsi_transport_sas ahci megaraid_sas wmi > dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] > Jun 8 12:29:19 localhost kernel: [147419.445094] Pid: 0, comm: swapper > veid: 0 Tainted: G --------------- T 2.6.32-042stab108.2 #1 > Jun 8 12:29:19 localhost kernel: [147419.445096] Call Trace: > Jun 8 12:29:19 localhost kernel: [147419.445098] <IRQ> > [<ffffffff8107b827>] ? > warn_slowpath_common+0x87/0xc0 Jun 8 12:29:19 localhost kernel: > [147419.445109] [<ffffffff8107b916>] ? > warn_slowpath_fmt+0x46/0x50 > Jun 8 12:29:19 localhost kernel: [147419.445112] [<ffffffff8148fccb>] ? > dev_watchdog+0x26b/0x280 > Jun 8 12:29:19 localhost kernel: [147419.445117] [<ffffffff81015009>] ? > sched_clock+0x9/0x10 > Jun 8 12:29:19 localhost kernel: [147419.445123] [<ffffffff8108f6cc>] ? > run_timer_softirq+0x1bc/0x380 > Jun 8 12:29:19 localhost kernel: [147419.445126] [<ffffffff8148fa60>] ? > dev_watchdog+0x0/0x280 > Jun 8 12:29:19 localhost kernel: [147419.445130] [<ffffffff81034ddd>] ? > lapic_next_event+0x1d/0x30 > Jun 8 12:29:19 localhost kernel: [147419.445134] [<ffffffff81084c7d>] ? > __do_softirq+0x10d/0x250 > Jun 8 12:29:19 localhost kernel: [147419.445139] [<ffffffff8100c48c>] ? > call_softirq+0x1c/0x30 > Jun 8 12:29:19 localhost kernel: [147419.445142] [<ffffffff810102b5>] ? > do_softirq+0x65/0xa0 > Jun 8 12:29:19 localhost kernel: [147419.445145] [<ffffffff81084a9d>] ? > irq_exit+0xcd/0xd0 > Jun 8 12:29:19 localhost kernel: [147419.445149] [<ffffffff8153f44a>] ? > smp_apic_timer_interrupt+0x4a/0x60 > Jun 8 12:29:19 localhost kernel: [147419.445152] [<ffffffff8100bc93>] ? > apic_timer_interrupt+0x13/0x20 > Jun 8 12:29:19 localhost kernel: [147419.445154] <EOI> > [<ffffffff812fa20e>] ? > intel_idle+0xde/0x170 Jun 8 12:29:19 localhost kernel: [147419.445160] > [<ffffffff812fa1f1>] ? > intel_idle+0xc1/0x170 > Jun 8 12:29:19 localhost kernel: [147419.445166] [<ffffffff81435d27>] ? > cpuidle_idle_call+0xa7/0x140 > Jun 8 12:29:19 localhost kernel: [147419.445170] [<ffffffff8100a026>] ? > cpu_idle+0xb6/0x110 > Jun 8 12:29:19 localhost kernel: [147419.445174] [<ffffffff8152df04>] ? > start_secondary+0x2be/0x301 > Jun 8 12:29:19 localhost kernel: [147419.445179] ---[ end trace > 2179b48f00e92658 ]--- > Jun 8 12:29:19 localhost kernel: [147419.445180] Tainting kernel with flag > 0x9 > Jun 8 12:29:19 localhost kernel: [147419.445182] Pid: 0, comm: swapper > veid: 0 Tainted: G --------------- T 2.6.32-042stab108.2 #1 > Jun 8 12:29:19 localhost kernel: [147419.445184] Call Trace: > Jun 8 12:29:19 localhost kernel: [147419.445185] <IRQ> > [<ffffffff8107b6b1>] ? > add_taint+0x71/0x80 Jun 8 12:29:19 localhost kernel: [147419.445190] > [<ffffffff8107b834>] ? > warn_slowpath_common+0x94/0xc0 > Jun 8 12:29:19 localhost kernel: [147419.445193] [<ffffffff8107b916>] ? > warn_slowpath_fmt+0x46/0x50 > Jun 8 12:29:19 localhost kernel: [147419.445196] [<ffffffff8148fccb>] ? > dev_watchdog+0x26b/0x280 > Jun 8 12:29:19 localhost kernel: [147419.445199] [<ffffffff81015009>] ? > sched_clock+0x9/0x10 > Jun 8 12:29:19 localhost kernel: [147419.445204] [<ffffffff8108f6cc>] ? > run_timer_softirq+0x1bc/0x380 > Jun 8 12:29:19 localhost kernel: [147419.445206] [<ffffffff8148fa60>] ? > dev_watchdog+0x0/0x280 > Jun 8 12:29:19 localhost kernel: [147419.445209] [<ffffffff81034ddd>] ? > lapic_next_event+0x1d/0x30 > Jun 8 12:29:19 localhost kernel: [147419.445213] [<ffffffff81084c7d>] ? > __do_softirq+0x10d/0x250 > Jun 8 12:29:19 localhost kernel: [147419.445217] [<ffffffff8100c48c>] ? > call_softirq+0x1c/0x30 > Jun 8 12:29:19 localhost kernel: [147419.445219] [<ffffffff810102b5>] ? > do_softirq+0x65/0xa0 > Jun 8 12:29:19 localhost kernel: [147419.445222] [<ffffffff81084a9d>] ? > irq_exit+0xcd/0xd0 > Jun 8 12:29:19 localhost kernel: [147419.445225] [<ffffffff8153f44a>] ? > smp_apic_timer_interrupt+0x4a/0x60 > Jun 8 12:29:19 localhost kernel: [147419.445227] [<ffffffff8100bc93>] ? > apic_timer_interrupt+0x13/0x20 > Jun 8 12:29:19 localhost kernel: [147419.445229] <EOI> > [<ffffffff812fa20e>] ? > intel_idle+0xde/0x170 Jun 8 12:29:19 localhost kernel: [147419.445233] > [<ffffffff812fa1f1>] ? > intel_idle+0xc1/0x170 > Jun 8 12:29:19 localhost kernel: [147419.445238] [<ffffffff81435d27>] ? > cpuidle_idle_call+0xa7/0x140 > Jun 8 12:29:19 localhost kernel: [147419.445241] [<ffffffff8100a026>] ? > cpu_idle+0xb6/0x110 > Jun 8 12:29:19 localhost kernel: [147419.445243] [<ffffffff8152df04>] ? > start_secondary+0x2be/0x301 > Jun 8 12:29:27 localhost kernel: [147427.448304] igb 0000:03:00.0: eth0: > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > 12:29:45 localhost kernel: [147445.478853] igb 0000:03:00.0: eth0: > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > 12:29:59 localhost kernel: [147459.496009] igb 0000:03:00.0: eth0: > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > 12:30:13 localhost kernel: [147473.459481] igb 0000:03:00.0: eth0: > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > 12:30:27 localhost kernel: [147487.460716] igb 0000:03:00.0: eth0: > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > 12:30:45 localhost kernel: [147505.498234] igb 0000:03:00.0: eth0: > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > 12:31:03 localhost kernel: [147523.513873] igb 0000:03:00.0: eth0: > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None > > Is there any more information I can provide the next time this happens? If > trends continue I should see it again in 3-5 weeks and will be able to collect > necessary info then. Unfortunately I've yet to find a way to replicate this > on > demand. > ------------------------------------------------------------------------------ > _______________________________________________ > E1000-devel mailing list > E1000-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel® Ethernet, visit > http://communities.intel.com/community/wired > > ------------------------------------------------------------------------------ > _______________________________________________ > E1000-devel mailing list > E1000-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel® Ethernet, visit > http://communities.intel.com/community/wired ------------------------------------------------------------------------------ _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired