It is a switch, and I can certainly check the switch next time this
happens.  The only thing that makes me think it's local to the box is the
repeated "Link is Up" messages that occur until reboot.

How would I go about getting a dump of the registers if this happens again?

On Mon, Jun 8, 2015 at 5:42 PM Fujinaka, Todd <todd.fujin...@intel.com>
wrote:

> It's difficult to troubleshoot a problem that's erratic and hard to
> replicate. A dump of the registers and status from the link partner would
> be a good idea. I wonder if it isn't something triggered on the other side
> (is it a switch?) that sees something odd and disables the port, and a
> reboot just takes long enough that the port comes back?
>
> Todd Fujinaka
> Software Application Engineer
> Networking Division (ND)
> Intel Corporation
> todd.fujin...@intel.com
> (503) 712-4565
>
> -----Original Message-----
> From: Brandon Whaley [mailto:redkr...@gmail.com]
> Sent: Monday, June 08, 2015 10:27 AM
> To: e1000-devel@lists.sourceforge.net
> Subject: [E1000-devel] igb driver sometimes stops responding after dkms
> build
>
> I use dkms to build the igb driver after new kernel installs on my fleet
> of servers using the following commands after every yum update:
>
> dkms build -m igb -v 5.2.18
> dkms install -m igb -v 5.2.18
>
> About once a month, one of my boxes (a different one each time) will stop
> responding after this.  Nothing I do is able to recover network
> connectivity short of a reboot (not loading/unloading the driver,
> restarting networking, etc.) and since these are production machines, it
> causes some downtime for us.  Below is what you see in the syslog when the
> event occurs:
>
> Jun  8 12:29:19 localhost kernel: [147419.444969] ------------[ cut here
> ]------------
> Jun  8 12:29:19 localhost kernel: [147419.444978] WARNING: at
> net/sched/sch_generic.c:267 dev_watchdog+0x26b/0x280() (Tainted: G
>   ---------------  T)
> Jun  8 12:29:19 localhost kernel: [147419.444981] Hardware name:
> X9DRL-3F/iF Jun  8 12:29:19 localhost kernel: [147419.444982] NETDEV
> WATCHDOG: eth0
> (igb): transmit queue 0 timed out
> Jun  8 12:29:19 localhost kernel: [147419.444984] Modules linked in:
> mpt3sas mpt2sas raid_class mptctl mptbase ip6t_rt ipt_addrtype xt_policy
> aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64
> aes_generic cbc kcare(U) vzethdev pio_kaio pio_nfs pio_direct pfmt_raw
> pfmt_ploop1 ploop simfs vziolimit vzdquota ip6t_REJECT xfrm6_mode_tunnel
> xfrm4_mode_tunnel nf_conntrack_netbios_ns nf_conntrack_broadcast
> nf_conntrack_netlink xt_comment nfsd ip6_tunnel ip_vs ipip xt_NFQUEUE
> xt_pkttype ecryptfs(T) ip_gre ip_tunnel ipt_MASQUERADE nf_nat_irc xt_helper
> nf_conntrack_irc nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6t_LOG
> xt_connlimit xt_recent pppoatm atm vzrst vzcpt nfs lockd fscache
> auth_rpcgss nfs_acl sunrpc xfrm4_mode_transport xfrm6_mode_transport ccm
> authenc esp6 ah6 cnic uio xfrm4_tunnel tunnel4 ipcomp6 xfrm6_tunnel tunnel6
> ipcomp xfrm_ipcomp esp4 ah4 af_key arc4 ecb ppp_mppe ppp_deflate
> zlib_deflate ppp_async ppp_generic slhc crc_ccitt fuse tun xt_MARK xt_mark
> vzevent autofs4 vznetdev vzmon vzdev ipt Jun  8 12:29:19 localhost kernel:
> _REDIRECT xt_owner nf_nat_ftp nf_conntrack_ftp iptable_nat nf_nat xt_state
> xt_length xt_hl xt_tcpmss xt_TCPMSS xt_multiport xt_limit nf_conntrack_ipv4
> nf_conntrack
> nf_defrag_ipv4 ipt_LOG xt_DSCP xt_dscp ipt_REJECT iptable_mangle xt_set
> iptable_filter iptable_raw ip_tables ip6table_mangle ip6table_filter
> ip6table_raw ip6_tables ipv6 ip_set_hash_ip ip_set nfnetlink iTCO_wdt
> iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler acpi_pad e1000e(U)
> ses enclosure sg igb(U) dca i2c_algo_bit ptp pps_core sb_edac edac_core
> i2c_i801 i2c_core lpc_ich mfd_core shpchp tcp_htcp ext4 jbd2 mbcache
> sd_mod crc_t10dif isci libsas scsi_transport_sas ahci megaraid_sas wmi
> dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Jun
> 8 12:29:19 localhost kernel: [147419.445094] Pid: 0, comm: swapper
> veid: 0 Tainted: G           ---------------  T 2.6.32-042stab108.2 #1
> Jun  8 12:29:19 localhost kernel: [147419.445096] Call Trace:
> Jun  8 12:29:19 localhost kernel: [147419.445098]  <IRQ>
> [<ffffffff8107b827>] ? warn_slowpath_common+0x87/0xc0 Jun  8 12:29:19
> localhost kernel: [147419.445109]  [<ffffffff8107b916>] ?
> warn_slowpath_fmt+0x46/0x50
> Jun  8 12:29:19 localhost kernel: [147419.445112]  [<ffffffff8148fccb>] ?
> dev_watchdog+0x26b/0x280
> Jun  8 12:29:19 localhost kernel: [147419.445117]  [<ffffffff81015009>] ?
> sched_clock+0x9/0x10
> Jun  8 12:29:19 localhost kernel: [147419.445123]  [<ffffffff8108f6cc>] ?
> run_timer_softirq+0x1bc/0x380
> Jun  8 12:29:19 localhost kernel: [147419.445126]  [<ffffffff8148fa60>] ?
> dev_watchdog+0x0/0x280
> Jun  8 12:29:19 localhost kernel: [147419.445130]  [<ffffffff81034ddd>] ?
> lapic_next_event+0x1d/0x30
> Jun  8 12:29:19 localhost kernel: [147419.445134]  [<ffffffff81084c7d>] ?
> __do_softirq+0x10d/0x250
> Jun  8 12:29:19 localhost kernel: [147419.445139]  [<ffffffff8100c48c>] ?
> call_softirq+0x1c/0x30
> Jun  8 12:29:19 localhost kernel: [147419.445142]  [<ffffffff810102b5>] ?
> do_softirq+0x65/0xa0
> Jun  8 12:29:19 localhost kernel: [147419.445145]  [<ffffffff81084a9d>] ?
> irq_exit+0xcd/0xd0
> Jun  8 12:29:19 localhost kernel: [147419.445149]  [<ffffffff8153f44a>] ?
> smp_apic_timer_interrupt+0x4a/0x60
> Jun  8 12:29:19 localhost kernel: [147419.445152]  [<ffffffff8100bc93>] ?
> apic_timer_interrupt+0x13/0x20
> Jun  8 12:29:19 localhost kernel: [147419.445154]  <EOI>
> [<ffffffff812fa20e>] ? intel_idle+0xde/0x170 Jun  8 12:29:19 localhost
> kernel: [147419.445160]  [<ffffffff812fa1f1>] ?
> intel_idle+0xc1/0x170
> Jun  8 12:29:19 localhost kernel: [147419.445166]  [<ffffffff81435d27>] ?
> cpuidle_idle_call+0xa7/0x140
> Jun  8 12:29:19 localhost kernel: [147419.445170]  [<ffffffff8100a026>] ?
> cpu_idle+0xb6/0x110
> Jun  8 12:29:19 localhost kernel: [147419.445174]  [<ffffffff8152df04>] ?
> start_secondary+0x2be/0x301
> Jun  8 12:29:19 localhost kernel: [147419.445179] ---[ end trace
> 2179b48f00e92658 ]---
> Jun  8 12:29:19 localhost kernel: [147419.445180] Tainting kernel with flag
> 0x9
> Jun  8 12:29:19 localhost kernel: [147419.445182] Pid: 0, comm: swapper
> veid: 0 Tainted: G           ---------------  T 2.6.32-042stab108.2 #1
> Jun  8 12:29:19 localhost kernel: [147419.445184] Call Trace:
> Jun  8 12:29:19 localhost kernel: [147419.445185]  <IRQ>
> [<ffffffff8107b6b1>] ? add_taint+0x71/0x80 Jun  8 12:29:19 localhost
> kernel: [147419.445190]  [<ffffffff8107b834>] ?
> warn_slowpath_common+0x94/0xc0
> Jun  8 12:29:19 localhost kernel: [147419.445193]  [<ffffffff8107b916>] ?
> warn_slowpath_fmt+0x46/0x50
> Jun  8 12:29:19 localhost kernel: [147419.445196]  [<ffffffff8148fccb>] ?
> dev_watchdog+0x26b/0x280
> Jun  8 12:29:19 localhost kernel: [147419.445199]  [<ffffffff81015009>] ?
> sched_clock+0x9/0x10
> Jun  8 12:29:19 localhost kernel: [147419.445204]  [<ffffffff8108f6cc>] ?
> run_timer_softirq+0x1bc/0x380
> Jun  8 12:29:19 localhost kernel: [147419.445206]  [<ffffffff8148fa60>] ?
> dev_watchdog+0x0/0x280
> Jun  8 12:29:19 localhost kernel: [147419.445209]  [<ffffffff81034ddd>] ?
> lapic_next_event+0x1d/0x30
> Jun  8 12:29:19 localhost kernel: [147419.445213]  [<ffffffff81084c7d>] ?
> __do_softirq+0x10d/0x250
> Jun  8 12:29:19 localhost kernel: [147419.445217]  [<ffffffff8100c48c>] ?
> call_softirq+0x1c/0x30
> Jun  8 12:29:19 localhost kernel: [147419.445219]  [<ffffffff810102b5>] ?
> do_softirq+0x65/0xa0
> Jun  8 12:29:19 localhost kernel: [147419.445222]  [<ffffffff81084a9d>] ?
> irq_exit+0xcd/0xd0
> Jun  8 12:29:19 localhost kernel: [147419.445225]  [<ffffffff8153f44a>] ?
> smp_apic_timer_interrupt+0x4a/0x60
> Jun  8 12:29:19 localhost kernel: [147419.445227]  [<ffffffff8100bc93>] ?
> apic_timer_interrupt+0x13/0x20
> Jun  8 12:29:19 localhost kernel: [147419.445229]  <EOI>
> [<ffffffff812fa20e>] ? intel_idle+0xde/0x170 Jun  8 12:29:19 localhost
> kernel: [147419.445233]  [<ffffffff812fa1f1>] ?
> intel_idle+0xc1/0x170
> Jun  8 12:29:19 localhost kernel: [147419.445238]  [<ffffffff81435d27>] ?
> cpuidle_idle_call+0xa7/0x140
> Jun  8 12:29:19 localhost kernel: [147419.445241]  [<ffffffff8100a026>] ?
> cpu_idle+0xb6/0x110
> Jun  8 12:29:19 localhost kernel: [147419.445243]  [<ffffffff8152df04>] ?
> start_secondary+0x2be/0x301
> Jun  8 12:29:27 localhost kernel: [147427.448304] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:29:45 localhost kernel: [147445.478853] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:29:59 localhost kernel: [147459.496009] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:30:13 localhost kernel: [147473.459481] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:30:27 localhost kernel: [147487.460716] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:30:45 localhost kernel: [147505.498234] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8
> 12:31:03 localhost kernel: [147523.513873] igb 0000:03:00.0: eth0:
> igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
>
> Is there any more information I can provide the next time this happens?
> If trends continue I should see it again in 3-5 weeks and will be able to
> collect necessary info then.  Unfortunately I've yet to find a way to
> replicate this on demand.
>
------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to