Thanks for the input everyone. I went over notes from previous outages of this kind and we have checked that the port was not disabled in the past, but I'll make sure we check again next time to be sure.
I'm unable to run ethregs as root due to what I assume is a security related kernel config (mmap - try rebooting with iomem=relaxed: Operation not permitted), but ethtool -d works on the interface in question so I'll gather that information as well when this occurs next time. Thanks again for your time. On Mon, Jun 8, 2015 at 5:56 PM Ronciak, John <john.ronc...@intel.com> wrote: > We think that maybe you are having link issues since the problem is > intermittent and moving from machine to machine. If it more repeatable on > a since system we could do more. Please check the link at the switch port > to see if it think it has link on that failing system. > > Thanks. > > Cheers, > John > > > -----Original Message----- > > From: Fujinaka, Todd [mailto:todd.fujin...@intel.com] > > Sent: Monday, June 8, 2015 2:53 PM > > To: Fujinaka, Todd; Brandon Whaley; e1000-devel@lists.sourceforge.net > > Subject: Re: [E1000-devel] igb driver sometimes stops responding after > dkms > > build > > > > ethregs would be my choice. ethtool -d also gives some information. > > > > Todd Fujinaka > > Software Application Engineer > > Networking Division (ND) > > Intel Corporation > > todd.fujin...@intel.com > > (503) 712-4565 > > > > -----Original Message----- > > From: Fujinaka, Todd [mailto:todd.fujin...@intel.com] > > Sent: Monday, June 08, 2015 2:42 PM > > To: Brandon Whaley; e1000-devel@lists.sourceforge.net > > Subject: Re: [E1000-devel] igb driver sometimes stops responding after > dkms > > build > > > > It's difficult to troubleshoot a problem that's erratic and hard to > replicate. A > > dump of the registers and status from the link partner would be a good > idea. I > > wonder if it isn't something triggered on the other side (is it a > switch?) that > > sees something odd and disables the port, and a reboot just takes long > enough > > that the port comes back? > > > > Todd Fujinaka > > Software Application Engineer > > Networking Division (ND) > > Intel Corporation > > todd.fujin...@intel.com > > (503) 712-4565 > > > > -----Original Message----- > > From: Brandon Whaley [mailto:redkr...@gmail.com] > > Sent: Monday, June 08, 2015 10:27 AM > > To: e1000-devel@lists.sourceforge.net > > Subject: [E1000-devel] igb driver sometimes stops responding after dkms > build > > > > I use dkms to build the igb driver after new kernel installs on my fleet > of > > servers using the following commands after every yum update: > > > > dkms build -m igb -v 5.2.18 > > dkms install -m igb -v 5.2.18 > > > > About once a month, one of my boxes (a different one each time) will stop > > responding after this. Nothing I do is able to recover network > connectivity short > > of a reboot (not loading/unloading the driver, restarting networking, > etc.) and > > since these are production machines, it causes some downtime for us. > Below is > > what you see in the syslog when the event occurs: > > > > Jun 8 12:29:19 localhost kernel: [147419.444969] ------------[ cut here > > ]------------ > > Jun 8 12:29:19 localhost kernel: [147419.444978] WARNING: at > > net/sched/sch_generic.c:267 dev_watchdog+0x26b/0x280() (Tainted: G > > --------------- T) > > Jun 8 12:29:19 localhost kernel: [147419.444981] Hardware name: > X9DRL-3F/iF > > Jun 8 12:29:19 localhost kernel: [147419.444982] NETDEV WATCHDOG: eth0 > > (igb): transmit queue 0 timed out > > Jun 8 12:29:19 localhost kernel: [147419.444984] Modules linked in: > > mpt3sas mpt2sas raid_class mptctl mptbase ip6t_rt ipt_addrtype xt_policy > > aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 > > aes_generic cbc kcare(U) vzethdev pio_kaio pio_nfs pio_direct pfmt_raw > > pfmt_ploop1 ploop simfs vziolimit vzdquota ip6t_REJECT xfrm6_mode_tunnel > > xfrm4_mode_tunnel nf_conntrack_netbios_ns nf_conntrack_broadcast > > nf_conntrack_netlink xt_comment nfsd ip6_tunnel ip_vs ipip xt_NFQUEUE > > xt_pkttype ecryptfs(T) ip_gre ip_tunnel ipt_MASQUERADE nf_nat_irc > xt_helper > > nf_conntrack_irc nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6t_LOG > > xt_connlimit xt_recent pppoatm atm vzrst vzcpt nfs lockd fscache > auth_rpcgss > > nfs_acl sunrpc xfrm4_mode_transport xfrm6_mode_transport ccm authenc > > esp6 ah6 cnic uio xfrm4_tunnel tunnel4 ipcomp6 xfrm6_tunnel tunnel6 > ipcomp > > xfrm_ipcomp esp4 ah4 af_key arc4 ecb ppp_mppe ppp_deflate zlib_deflate > > ppp_async ppp_generic slhc crc_ccitt fuse tun xt_MARK xt_mark vzevent > > autofs4 vznetdev vzmon vzdev ipt Jun 8 12:29:19 localhost kernel: > _REDIRECT > > xt_owner nf_nat_ftp nf_conntrack_ftp iptable_nat nf_nat xt_state > xt_length > > xt_hl xt_tcpmss xt_TCPMSS xt_multiport xt_limit nf_conntrack_ipv4 > > nf_conntrack > > nf_defrag_ipv4 ipt_LOG xt_DSCP xt_dscp ipt_REJECT iptable_mangle xt_set > > iptable_filter iptable_raw ip_tables ip6table_mangle ip6table_filter > > ip6table_raw ip6_tables ipv6 ip_set_hash_ip ip_set nfnetlink iTCO_wdt > > iTCO_vendor_support ipmi_devintf ipmi_si ipmi_msghandler acpi_pad > > e1000e(U) ses enclosure sg igb(U) dca i2c_algo_bit ptp pps_core sb_edac > > edac_core > > i2c_i801 i2c_core lpc_ich mfd_core shpchp tcp_htcp ext4 jbd2 mbcache > > sd_mod crc_t10dif isci libsas scsi_transport_sas ahci megaraid_sas wmi > > dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] > > Jun 8 12:29:19 localhost kernel: [147419.445094] Pid: 0, comm: swapper > > veid: 0 Tainted: G --------------- T 2.6.32-042stab108.2 #1 > > Jun 8 12:29:19 localhost kernel: [147419.445096] Call Trace: > > Jun 8 12:29:19 localhost kernel: [147419.445098] <IRQ> > [<ffffffff8107b827>] ? > > warn_slowpath_common+0x87/0xc0 Jun 8 12:29:19 localhost kernel: > > [147419.445109] [<ffffffff8107b916>] ? > > warn_slowpath_fmt+0x46/0x50 > > Jun 8 12:29:19 localhost kernel: [147419.445112] [<ffffffff8148fccb>] ? > > dev_watchdog+0x26b/0x280 > > Jun 8 12:29:19 localhost kernel: [147419.445117] [<ffffffff81015009>] ? > > sched_clock+0x9/0x10 > > Jun 8 12:29:19 localhost kernel: [147419.445123] [<ffffffff8108f6cc>] ? > > run_timer_softirq+0x1bc/0x380 > > Jun 8 12:29:19 localhost kernel: [147419.445126] [<ffffffff8148fa60>] ? > > dev_watchdog+0x0/0x280 > > Jun 8 12:29:19 localhost kernel: [147419.445130] [<ffffffff81034ddd>] ? > > lapic_next_event+0x1d/0x30 > > Jun 8 12:29:19 localhost kernel: [147419.445134] [<ffffffff81084c7d>] ? > > __do_softirq+0x10d/0x250 > > Jun 8 12:29:19 localhost kernel: [147419.445139] [<ffffffff8100c48c>] ? > > call_softirq+0x1c/0x30 > > Jun 8 12:29:19 localhost kernel: [147419.445142] [<ffffffff810102b5>] ? > > do_softirq+0x65/0xa0 > > Jun 8 12:29:19 localhost kernel: [147419.445145] [<ffffffff81084a9d>] ? > > irq_exit+0xcd/0xd0 > > Jun 8 12:29:19 localhost kernel: [147419.445149] [<ffffffff8153f44a>] ? > > smp_apic_timer_interrupt+0x4a/0x60 > > Jun 8 12:29:19 localhost kernel: [147419.445152] [<ffffffff8100bc93>] ? > > apic_timer_interrupt+0x13/0x20 > > Jun 8 12:29:19 localhost kernel: [147419.445154] <EOI> > [<ffffffff812fa20e>] ? > > intel_idle+0xde/0x170 Jun 8 12:29:19 localhost kernel: [147419.445160] > > [<ffffffff812fa1f1>] ? > > intel_idle+0xc1/0x170 > > Jun 8 12:29:19 localhost kernel: [147419.445166] [<ffffffff81435d27>] ? > > cpuidle_idle_call+0xa7/0x140 > > Jun 8 12:29:19 localhost kernel: [147419.445170] [<ffffffff8100a026>] ? > > cpu_idle+0xb6/0x110 > > Jun 8 12:29:19 localhost kernel: [147419.445174] [<ffffffff8152df04>] ? > > start_secondary+0x2be/0x301 > > Jun 8 12:29:19 localhost kernel: [147419.445179] ---[ end trace > > 2179b48f00e92658 ]--- > > Jun 8 12:29:19 localhost kernel: [147419.445180] Tainting kernel with > flag > > 0x9 > > Jun 8 12:29:19 localhost kernel: [147419.445182] Pid: 0, comm: swapper > > veid: 0 Tainted: G --------------- T 2.6.32-042stab108.2 #1 > > Jun 8 12:29:19 localhost kernel: [147419.445184] Call Trace: > > Jun 8 12:29:19 localhost kernel: [147419.445185] <IRQ> > [<ffffffff8107b6b1>] ? > > add_taint+0x71/0x80 Jun 8 12:29:19 localhost kernel: [147419.445190] > > [<ffffffff8107b834>] ? > > warn_slowpath_common+0x94/0xc0 > > Jun 8 12:29:19 localhost kernel: [147419.445193] [<ffffffff8107b916>] ? > > warn_slowpath_fmt+0x46/0x50 > > Jun 8 12:29:19 localhost kernel: [147419.445196] [<ffffffff8148fccb>] ? > > dev_watchdog+0x26b/0x280 > > Jun 8 12:29:19 localhost kernel: [147419.445199] [<ffffffff81015009>] ? > > sched_clock+0x9/0x10 > > Jun 8 12:29:19 localhost kernel: [147419.445204] [<ffffffff8108f6cc>] ? > > run_timer_softirq+0x1bc/0x380 > > Jun 8 12:29:19 localhost kernel: [147419.445206] [<ffffffff8148fa60>] ? > > dev_watchdog+0x0/0x280 > > Jun 8 12:29:19 localhost kernel: [147419.445209] [<ffffffff81034ddd>] ? > > lapic_next_event+0x1d/0x30 > > Jun 8 12:29:19 localhost kernel: [147419.445213] [<ffffffff81084c7d>] ? > > __do_softirq+0x10d/0x250 > > Jun 8 12:29:19 localhost kernel: [147419.445217] [<ffffffff8100c48c>] ? > > call_softirq+0x1c/0x30 > > Jun 8 12:29:19 localhost kernel: [147419.445219] [<ffffffff810102b5>] ? > > do_softirq+0x65/0xa0 > > Jun 8 12:29:19 localhost kernel: [147419.445222] [<ffffffff81084a9d>] ? > > irq_exit+0xcd/0xd0 > > Jun 8 12:29:19 localhost kernel: [147419.445225] [<ffffffff8153f44a>] ? > > smp_apic_timer_interrupt+0x4a/0x60 > > Jun 8 12:29:19 localhost kernel: [147419.445227] [<ffffffff8100bc93>] ? > > apic_timer_interrupt+0x13/0x20 > > Jun 8 12:29:19 localhost kernel: [147419.445229] <EOI> > [<ffffffff812fa20e>] ? > > intel_idle+0xde/0x170 Jun 8 12:29:19 localhost kernel: [147419.445233] > > [<ffffffff812fa1f1>] ? > > intel_idle+0xc1/0x170 > > Jun 8 12:29:19 localhost kernel: [147419.445238] [<ffffffff81435d27>] ? > > cpuidle_idle_call+0xa7/0x140 > > Jun 8 12:29:19 localhost kernel: [147419.445241] [<ffffffff8100a026>] ? > > cpu_idle+0xb6/0x110 > > Jun 8 12:29:19 localhost kernel: [147419.445243] [<ffffffff8152df04>] ? > > start_secondary+0x2be/0x301 > > Jun 8 12:29:27 localhost kernel: [147427.448304] igb 0000:03:00.0: eth0: > > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > > 12:29:45 localhost kernel: [147445.478853] igb 0000:03:00.0: eth0: > > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > > 12:29:59 localhost kernel: [147459.496009] igb 0000:03:00.0: eth0: > > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > > 12:30:13 localhost kernel: [147473.459481] igb 0000:03:00.0: eth0: > > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > > 12:30:27 localhost kernel: [147487.460716] igb 0000:03:00.0: eth0: > > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > > 12:30:45 localhost kernel: [147505.498234] igb 0000:03:00.0: eth0: > > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None Jun 8 > > 12:31:03 localhost kernel: [147523.513873] igb 0000:03:00.0: eth0: > > igb: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None > > > > Is there any more information I can provide the next time this happens? > If > > trends continue I should see it again in 3-5 weeks and will be able to > collect > > necessary info then. Unfortunately I've yet to find a way to replicate > this on > > demand. > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > E1000-devel mailing list > > E1000-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/e1000-devel > > To learn more about Intel® Ethernet, visit > > http://communities.intel.com/community/wired > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > E1000-devel mailing list > > E1000-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/e1000-devel > > To learn more about Intel® Ethernet, visit > > http://communities.intel.com/community/wired >
------------------------------------------------------------------------------
_______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired