ethregs would be my choice. ethtool -d also gives some information.

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565

-----Original Message-----
From: Fujinaka, Todd [mailto:todd.fujin...@intel.com] 
Sent: Monday, June 08, 2015 2:42 PM
To: Brandon Whaley; e1000-devel@lists.sourceforge.net
Subject: Re: [E1000-devel] igb driver sometimes stops responding after dkms 
build

It's difficult to troubleshoot a problem that's erratic and hard to replicate. 
A dump of the registers and status from the link partner would be a good idea. 
I wonder if it isn't something triggered on the other side (is it a switch?) 
that sees something odd and disables the port, and a reboot just takes long 
enough that the port comes back?

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujin...@intel.com
(503) 712-4565

-----Original Message-----
From: Brandon Whaley [mailto:redkr...@gmail.com] 
Sent: Monday, June 08, 2015 10:27 AM
To: e1000-devel@lists.sourceforge.net
Subject: [E1000-devel] igb driver sometimes stops responding after dkms build

I use dkms to build the igb driver after new kernel installs on my fleet of 
servers using the following commands after every yum update:

dkms build -m igb -v 5.2.18
dkms install -m igb -v 5.2.18

About once a month, one of my boxes (a different one each time) will stop 
responding after this.  Nothing I do is able to recover network connectivity 
short of a reboot (not loading/unloading the driver, restarting networking, 
etc.) and since these are production machines, it causes some downtime for us.  
Below is what you see in the syslog when the event occurs:

Jun  8 12:29:19 localhost kernel: [147419.444969] ------------[ cut here
]------------
Jun  8 12:29:19 localhost kernel: [147419.444978] WARNING: at
net/sched/sch_generic.c:267 dev_watchdog+0x26b/0x280() (Tainted: G
  ---------------  T)
Jun  8 12:29:19 localhost kernel: [147419.444981] Hardware name: X9DRL-3F/iF 
Jun  8 12:29:19 localhost kernel: [147419.444982] NETDEV WATCHDOG: eth0
(igb): transmit queue 0 timed out
Jun  8 12:29:19 localhost kernel: [147419.444984] Modules linked in:
mpt3sas mpt2sas raid_class mptctl mptbase ip6t_rt ipt_addrtype xt_policy 
aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 aes_generic 
cbc kcare(U) vzethdev pio_kaio pio_nfs pio_direct pfmt_raw
pfmt_ploop1 ploop simfs vziolimit vzdquota ip6t_REJECT xfrm6_mode_tunnel 
xfrm4_mode_tunnel nf_conntrack_netbios_ns nf_conntrack_broadcast 
nf_conntrack_netlink xt_comment nfsd ip6_tunnel ip_vs ipip xt_NFQUEUE 
xt_pkttype ecryptfs(T) ip_gre ip_tunnel ipt_MASQUERADE nf_nat_irc xt_helper 
nf_conntrack_irc nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6t_LOG 
xt_connlimit xt_recent pppoatm atm vzrst vzcpt nfs lockd fscache auth_rpcgss 
nfs_acl sunrpc xfrm4_mode_transport xfrm6_mode_transport ccm authenc esp6 ah6 
cnic uio xfrm4_tunnel tunnel4 ipcomp6 xfrm6_tunnel tunnel6 ipcomp xfrm_ipcomp 
esp4 ah4 af_key arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async 
ppp_generic slhc crc_ccitt fuse tun xt_MARK xt_mark vzevent autofs4 vznetdev 
vzmon vzdev ipt Jun  8 12:29:19 localhost kernel: _REDIRECT xt_owner nf_nat_ftp 
nf_conntrack_ftp iptable_nat nf_nat xt_state xt_length xt_hl xt_tcpmss 
xt_TCPMSS xt_multiport xt_limit nf_conntrack_ipv4 nf_conntrack
nf_defrag_ipv4 ipt_LOG xt_DSCP xt_dscp ipt_REJECT iptable_mangle xt_set 
iptable_filter iptable_raw ip_tables ip6table_mangle ip6table_filter 
ip6table_raw ip6_tables ipv6 ip_set_hash_ip ip_set nfnetlink iTCO_wdt 
iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler acpi_pad e1000e(U) ses 
enclosure sg igb(U) dca i2c_algo_bit ptp pps_core sb_edac edac_core
i2c_i801 i2c_core lpc_ich mfd_core shpchp tcp_htcp ext4 jbd2 mbcache sd_mod 
crc_t10dif isci libsas scsi_transport_sas ahci megaraid_sas wmi dm_mirror 
dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] Jun  8 12:29:19 
localhost kernel: [147419.445094] Pid: 0, comm: swapper
veid: 0 Tainted: G           ---------------  T 2.6.32-042stab108.2 #1
Jun  8 12:29:19 localhost kernel: [147419.445096] Call Trace:
Jun  8 12:29:19 localhost kernel: [147419.445098]  <IRQ>  [<ffffffff8107b827>] 
? warn_slowpath_common+0x87/0xc0 Jun  8 12:29:19 localhost kernel: 
[147419.445109]  [<ffffffff8107b916>] ?
warn_slowpath_fmt+0x46/0x50
Jun  8 12:29:19 localhost kernel: [147419.445112]  [<ffffffff8148fccb>] ?
dev_watchdog+0x26b/0x280
Jun  8 12:29:19 localhost kernel: [147419.445117]  [<ffffffff81015009>] ?
sched_clock+0x9/0x10
Jun  8 12:29:19 localhost kernel: [147419.445123]  [<ffffffff8108f6cc>] ?
run_timer_softirq+0x1bc/0x380
Jun  8 12:29:19 localhost kernel: [147419.445126]  [<ffffffff8148fa60>] ?
dev_watchdog+0x0/0x280
Jun  8 12:29:19 localhost kernel: [147419.445130]  [<ffffffff81034ddd>] ?
lapic_next_event+0x1d/0x30
Jun  8 12:29:19 localhost kernel: [147419.445134]  [<ffffffff81084c7d>] ?
__do_softirq+0x10d/0x250
Jun  8 12:29:19 localhost kernel: [147419.445139]  [<ffffffff8100c48c>] ?
call_softirq+0x1c/0x30
Jun  8 12:29:19 localhost kernel: [147419.445142]  [<ffffffff810102b5>] ?
do_softirq+0x65/0xa0
Jun  8 12:29:19 localhost kernel: [147419.445145]  [<ffffffff81084a9d>] ?
irq_exit+0xcd/0xd0
Jun  8 12:29:19 localhost kernel: [147419.445149]  [<ffffffff8153f44a>] ?
smp_apic_timer_interrupt+0x4a/0x60
Jun  8 12:29:19 localhost kernel: [147419.445152]  [<ffffffff8100bc93>] ?
apic_timer_interrupt+0x13/0x20
Jun  8 12:29:19 localhost kernel: [147419.445154]  <EOI>  [<ffffffff812fa20e>] 
? intel_idle+0xde/0x170 Jun  8 12:29:19 localhost kernel: [147419.445160]  
[<ffffffff812fa1f1>] ?
intel_idle+0xc1/0x170
Jun  8 12:29:19 localhost kernel: [147419.445166]  [<ffffffff81435d27>] ?
cpuidle_idle_call+0xa7/0x140
Jun  8 12:29:19 localhost kernel: [147419.445170]  [<ffffffff8100a026>] ?
cpu_idle+0xb6/0x110
Jun  8 12:29:19 localhost kernel: [147419.445174]  [<ffffffff8152df04>] ?
start_secondary+0x2be/0x301
Jun  8 12:29:19 localhost kernel: [147419.445179] ---[ end trace
2179b48f00e92658 ]---
Jun  8 12:29:19 localhost kernel: [147419.445180] Tainting kernel with flag
0x9
Jun  8 12:29:19 localhost kernel: [147419.445182] Pid: 0, comm: swapper
veid: 0 Tainted: G           ---------------  T 2.6.32-042stab108.2 #1
Jun  8 12:29:19 localhost kernel: [147419.445184] Call Trace:
Jun  8 12:29:19 localhost kernel: [147419.445185]  <IRQ>  [<ffffffff8107b6b1>] 
? add_taint+0x71/0x80 Jun  8 12:29:19 localhost kernel: [147419.445190]  
[<ffffffff8107b834>] ?
warn_slowpath_common+0x94/0xc0
Jun  8 12:29:19 localhost kernel: [147419.445193]  [<ffffffff8107b916>] ?
warn_slowpath_fmt+0x46/0x50
Jun  8 12:29:19 localhost kernel: [147419.445196]  [<ffffffff8148fccb>] ?
dev_watchdog+0x26b/0x280
Jun  8 12:29:19 localhost kernel: [147419.445199]  [<ffffffff81015009>] ?
sched_clock+0x9/0x10
Jun  8 12:29:19 localhost kernel: [147419.445204]  [<ffffffff8108f6cc>] ?
run_timer_softirq+0x1bc/0x380
Jun  8 12:29:19 localhost kernel: [147419.445206]  [<ffffffff8148fa60>] ?
dev_watchdog+0x0/0x280
Jun  8 12:29:19 localhost kernel: [147419.445209]  [<ffffffff81034ddd>] ?
lapic_next_event+0x1d/0x30
Jun  8 12:29:19 localhost kernel: [147419.445213]  [<ffffffff81084c7d>] ?
__do_softirq+0x10d/0x250
Jun  8 12:29:19 localhost kernel: [147419.445217]  [<ffffffff8100c48c>] ?
call_softirq+0x1c/0x30
Jun  8 12:29:19 localhost kernel: [147419.445219]  [<ffffffff810102b5>] ?
do_softirq+0x65/0xa0
Jun  8 12:29:19 localhost kernel: [147419.445222]  [<ffffffff81084a9d>] ?
irq_exit+0xcd/0xd0
Jun  8 12:29:19 localhost kernel: [147419.445225]  [<ffffffff8153f44a>] ?
smp_apic_timer_interrupt+0x4a/0x60
Jun  8 12:29:19 localhost kernel: [147419.445227]  [<ffffffff8100bc93>] ?
apic_timer_interrupt+0x13/0x20
Jun  8 12:29:19 localhost kernel: [147419.445229]  <EOI>  [<ffffffff812fa20e>] 
? intel_idle+0xde/0x170 Jun  8 12:29:19 localhost kernel: [147419.445233]  
[<ffffffff812fa1f1>] ?
intel_idle+0xc1/0x170
Jun  8 12:29:19 localhost kernel: [147419.445238]  [<ffffffff81435d27>] ?
cpuidle_idle_call+0xa7/0x140
Jun  8 12:29:19 localhost kernel: [147419.445241]  [<ffffffff8100a026>] ?
cpu_idle+0xb6/0x110
Jun  8 12:29:19 localhost kernel: [147419.445243]  [<ffffffff8152df04>] ?
start_secondary+0x2be/0x301
Jun  8 12:29:27 localhost kernel: [147427.448304] igb 0000:03:00.0: eth0:
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8 
12:29:45 localhost kernel: [147445.478853] igb 0000:03:00.0: eth0:
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8 
12:29:59 localhost kernel: [147459.496009] igb 0000:03:00.0: eth0:
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8 
12:30:13 localhost kernel: [147473.459481] igb 0000:03:00.0: eth0:
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8 
12:30:27 localhost kernel: [147487.460716] igb 0000:03:00.0: eth0:
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8 
12:30:45 localhost kernel: [147505.498234] igb 0000:03:00.0: eth0:
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun  8 
12:31:03 localhost kernel: [147523.513873] igb 0000:03:00.0: eth0:
igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

Is there any more information I can provide the next time this happens?  If 
trends continue I should see it again in 3-5 weeks and will be able to collect 
necessary info then.  Unfortunately I've yet to find a way to replicate this on 
demand.
------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

------------------------------------------------------------------------------
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to