Lars, You can change driver msglvl by running 'ethtool -s ethx msglvl 0x2c00' after driver loads. So when issue occurs it will log hw ring info into message log. Please give it a try and send log after issue occurs.
Just wanted to ask is this the only board that has this issue? I looked in our lab but couldn't find S1200BLT for reproduction. -Tushar >-----Original Message----- >From: Lars Maschke [mailto:[email protected]] >Sent: Wednesday, March 06, 2013 1:58 AM >To: Dave, Tushar N >Cc: [email protected] >Subject: Re-4: e1000e detected hardware unit hang problem > > > >Hi Tushar, > >I tried the "build in" driver of the described kernels 3.3.8, 3.4.32 and >3.7.9 but I had no success. > >I know this because a have seen the error message three or four times on >the screen. Once it was logged in the kern.log as You see here: > > >Mar 1 00:53:36 lightning kernel: e1000e 0000:00:19.0 eth0: Detected >Hardware Unit Hang: >Mar 1 00:53:36 lightning kernel: TDH <a7> >Mar 1 00:53:36 lightning kernel: TDT <c6> >Mar 1 00:53:36 lightning kernel: next_to_use <c6> >Mar 1 00:53:36 lightning kernel: next_to_clean <a5> >Mar 1 00:53:36 lightning kernel: buffer_info[next_to_clean]: >Mar 1 00:53:36 lightning kernel: time_stamp <5a0f4f> >Mar 1 00:53:36 lightning kernel: next_to_watch <a7> >Mar 1 00:53:36 lightning kernel: jiffies <5a1199> >Mar 1 00:53:36 lightning kernel: next_to_watch.status <0> >Mar 1 00:53:36 lightning kernel: MAC Status <40080083> >Mar 1 00:53:36 lightning kernel: PHY Status <796d> >Mar 1 00:53:36 lightning kernel: PHY 1000BASE-T Status <3c00> >Mar 1 00:53:36 lightning kernel: PHY Extended Status <3000> >Mar 1 00:53:36 lightning kernel: PCI Status <10> >Mar 1 00:53:38 lightning kernel: e1000e 0000:00:19.0 eth0: Detected >Hardware Unit Hang: >Mar 1 00:53:38 lightning kernel: TDH <a7> >Mar 1 00:53:38 lightning kernel: TDT <c6> >Mar 1 00:53:38 lightning kernel: next_to_use <c6> >Mar 1 00:53:38 lightning kernel: next_to_clean <a5> >Mar 1 00:53:38 lightning kernel: buffer_info[next_to_clean]: >Mar 1 00:53:38 lightning kernel: time_stamp <5a0f4f> >Mar 1 00:53:38 lightning kernel: next_to_watch <a7> >Mar 1 00:53:38 lightning kernel: jiffies <5a138d> >Mar 1 00:53:38 lightning kernel: next_to_watch.status <0> >Mar 1 00:53:38 lightning kernel: MAC Status <40080083> >Mar 1 00:53:38 lightning kernel: PHY Status <796d> >Mar 1 00:53:38 lightning kernel: PHY 1000BASE-T Status <3c00> >Mar 1 00:53:38 lightning kernel: PHY Extended Status <3000> >Mar 1 00:53:38 lightning kernel: PCI Status <10> >Mar 1 00:53:40 lightning kernel: e1000e 0000:00:19.0 eth0: Detected >Hardware Unit Hang: >Mar 1 00:53:40 lightning kernel: TDH <a7> >Mar 1 00:53:40 lightning kernel: TDT <c6> >Mar 1 00:53:40 lightning kernel: next_to_use <c6> >Mar 1 00:53:40 lightning kernel: next_to_clean <a5> >Mar 1 00:53:40 lightning kernel: buffer_info[next_to_clean]: >Mar 1 00:53:40 lightning kernel: time_stamp <5a0f4f> >Mar 1 00:53:40 lightning kernel: next_to_watch <a7> >Mar 1 00:53:40 lightning kernel: jiffies <5a1581> >Mar 1 00:53:40 lightning kernel: next_to_watch.status <0> >Mar 1 00:53:40 lightning kernel: MAC Status <40080083> >Mar 1 00:53:40 lightning kernel: PHY Status <796d> >Mar 1 00:53:40 lightning kernel: PHY 1000BASE-T Status <3c00> >Mar 1 00:53:40 lightning kernel: PHY Extended Status <3000> >Mar 1 00:53:40 lightning kernel: PCI Status <10> >Mar 1 00:53:42 lightning kernel: e1000e 0000:00:19.0 eth0: Detected >Hardware Unit Hang: >Mar 1 00:53:42 lightning kernel: TDH <a7> >Mar 1 00:53:42 lightning kernel: TDT <c6> >Mar 1 00:53:42 lightning kernel: next_to_use <c6> >Mar 1 00:53:42 lightning kernel: next_to_clean <a5> >Mar 1 00:53:42 lightning kernel: buffer_info[next_to_clean]: >Mar 1 00:53:42 lightning kernel: time_stamp <5a0f4f> >Mar 1 00:53:42 lightning kernel: next_to_watch <a7> >Mar 1 00:53:42 lightning kernel: jiffies <5a1775> >Mar 1 00:53:42 lightning kernel: next_to_watch.status <0> >Mar 1 00:53:42 lightning kernel: MAC Status <40080083> >Mar 1 00:53:42 lightning kernel: PHY Status <796d> >Mar 1 00:53:42 lightning kernel: PHY 1000BASE-T Status <3c00> >Mar 1 00:53:42 lightning kernel: PHY Extended Status <3000> >Mar 1 00:53:42 lightning kernel: PCI Status <10> >Mar 1 00:53:43 lightning kernel: ------------[ cut here ]------------ Mar >1 00:53:43 lightning kernel: WARNING: at net/sched/sch_generic.c:255 >dev_watchdog+0x1da/0x1f0() Mar 1 00:53:43 lightning kernel: Hardware >name: S1200BTL Mar 1 00:53:43 lightning kernel: NETDEV WATCHDOG: eth0 >(e1000e): transmit queue 0 timed out Mar 1 00:53:43 lightning kernel: >Modules linked in: af_packet xt_REDIRECT xt_DSCP xt_dscp xt_statistic >xt_CT xt_NFLOG nfnetlink_log nfnetlink ipt_ULOG xt_LOG xt_time >xt_connlimit xt_helper xt_realm xt_NFQUEUE xt_tcpmss xt_tcpudp xt_addrtype >xt_pkttype iptable_raw xt_TPROXY nf_tproxy_core xt_CLASSIFY xt_mark >xt_hashlimit xt_comment ipt_REJECT xt_length xt_connmark xt_owner >xt_recent xt_iprange xt_physdev xt_policy iptable_mangle xt_nat >xt_multiport xt_conntrack nf_nat_ftp iptable_nat nf_conntrack_ipv4 >nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack_ftp nf_conntrack >iptable_filter ip_tables x_tables ipmi_poweroff ipmi_devintf ipmi_si >ipmi_watchdog ipmi_msghandler minix ppp_mppe ppp_generic slhc tun e1000 >usb_storage mousedev usbhid dm_mod microcode i2c_i801 i2c_core sr_mod >ehci_hcd e1000e(O) firmware_class cdrom usbcore usb_common evdev unix >Mar 1 00:53:43 lightning kernel: Pid: 0, comm: swapper/2 Tainted: G >O 3.7.9 #2 >Mar 1 00:53:43 lightning kernel: Call Trace: >Mar 1 00:53:43 lightning kernel: [<c013696d>] >warn_slowpath_common+0x6d/0xa0 Mar 1 00:53:43 lightning kernel: >[<c046a06a>] ? dev_watchdog+0x1da/0x1f0 Mar 1 00:53:43 lightning kernel: >[<c046a06a>] ? dev_watchdog+0x1da/0x1f0 Mar 1 00:53:43 lightning kernel: >[<c0136a1e>] warn_slowpath_fmt+0x2e/0x30 Mar 1 00:53:43 lightning kernel: >[<c046a06a>] dev_watchdog+0x1da/0x1f0 Mar 1 00:53:43 lightning kernel: >[<c0469e90>] ? pfifo_fast_dequeue+0xe0/0xe0 Mar 1 00:53:43 lightning >kernel: [<c0142e7d>] call_timer_fn.isra.32+0x1d/0x80 Mar 1 00:53:43 >lightning kernel: [<c0143051>] run_timer_softirq+0x171/0x180 Mar 1 >00:53:43 lightning kernel: [<c0469e90>] ? pfifo_fast_dequeue+0xe0/0xe0 >Mar 1 00:53:43 lightning kernel: [<c013d6c0>] __do_softirq+0x90/0x140 >Mar 1 00:53:43 lightning kernel: [<c013d630>] ? >__tasklet_schedule+0x60/0x60 Mar 1 00:53:43 lightning kernel: <IRQ> >[<c013d875>] ? irq_exit+0x65/0x70 Mar 1 00:53:43 lightning kernel: >[<c0127894>] ? smp_apic_timer_interrupt+0x54/0x90 >Mar 1 00:53:43 lightning kernel: [<c04eee3d>] ? >apic_timer_interrupt+0x2d/0x34 Mar 1 00:53:43 lightning kernel: >[<c02e45b6>] ? acpi_idle_enter_bm+0x251/0x286 Mar 1 00:53:43 lightning >kernel: [<c042a335>] ? cpuidle_enter+0x15/0x20 Mar 1 00:53:43 lightning >kernel: [<c042a8fe>] ? cpuidle_idle_call+0x6e/0xd0 Mar 1 00:53:43 >lightning kernel: [<c0111305>] ? cpu_idle+0x55/0xa0 Mar 1 00:53:43 >lightning kernel: [<c04e522b>] ? start_secondary+0x19b/0x1a1 Mar 1 >00:53:43 lightning kernel: ---[ end trace a49ef186404ae76d ]--- Mar 1 >00:53:43 lightning kernel: e1000e 0000:00:19.0 eth0: Reset adapter >unexpectedly > >Greets >Lars > > > >Original Message processed by david(r) >RE: Re-2: e1000e detected hardware unit hang problem (06-Mrz-2013 0:17) >From: Dave, Tushar N >To:Lars Maschke >Cc:[email protected] > > > >>-----Original Message----- >>From: Lars Maschke [mailto:[email protected]] >>Sent: Monday, March 04, 2013 3:11 PM >>To: Dave, Tushar N >>Cc: [email protected] >>Subject: Re-2: e1000e detected hardware unit hang problem >> >>Hello Tushar, >> >>first of all. Thanks for Your quick reply. >> >>That's the point. I don't know why this occurs. If I have the chance I >>see a failure of the e1000e driver on the console. The server is >>completly down and I can't logon to get any other information. >> >>The error isn't logged at dmesg.log or syslog on my debian system. >>There is no logging at all after the crash. > >We at the least need the dmesg from the kernel to start with. How would >you know that you're getting detected hardware unit hang message. Is it >possible to take a picture or connect serial console to the server to >retrieve the message after crash/hang? >> >>Only a full reset solves the problem. I get this error every two, three >>or four days. At the crash time no special cron job is running. It >>occurs only an night between 0:00h and 2:30h. >>From now on I try to reset networking with the following bash-script >every >night and I hope that it's a good idea: > >>--- >>#!/bin/sh > >>/etc/init.d/networking stop >>/sbin/rmmod e1000e >>/sbin/modprobe e1000e RxIntDelay=0,0 IntMode=1,1 /etc/init.d/networking >>start >/sbin/ethtool -K eth0 tso off /sbin/shorewall restart >>--- >How many e1000e devices you have in the system? > >>Do You think that the problem can occur when the other Intel "e1000" >>driver is also loaded on the machine? >I don't think so however if you can, give it a try. And let me know if >anything changes! >Have you tried the in-kernel e1000e driver? > >-Tushar > >To: [email protected] >Cc: [email protected] > ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
