Hello Tushar, first of all. Thanks for Your quick reply.
That's the point. I don't know why this occurs. If I have the chance I see a failure of the e1000e driver on the console. The server is completly down and I can't logon to get any other information. The error isn't logged at dmesg.log or syslog on my debian system. There is no logging at all after the crash. Only a full reset solves the problem. I get this error every two, three or four days. At the crash time no special cron job is running. It occurs only an night between 0:00h and 2:30h. >From now on I try to reset networking with the following bash-script every >night and I hope that it's a good idea: --- #!/bin/sh /etc/init.d/networking stop /sbin/rmmod e1000e /sbin/modprobe e1000e RxIntDelay=0,0 IntMode=1,1 /etc/init.d/networking start /sbin/ethtool -K eth0 tso off /sbin/shorewall restart --- As You see I read some other posts and the readme of the official driver. Do You think "RxIntDelay=0,0" can make my problem go away? I also tried IntMode=0,0 with no success. With the MSI-PCI option of the kernel I can see that the eth0 netcard is on exclusive interrupt. I have this trouble from the first use of the S1200BLT board. Normally I use this board with VMWare ESXi Servers of version 4.1, 5.0 and 5.1 with no problem. That's why I don't understand this. Here's the output of the lspci: --- 03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection Subsystem: Intel Corporation Device 3578 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 44 Region 0: Memory at c2300000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at 2000 [size=32] Region 3: Memory at c2320000 (32-bit, non-prefetchable) [size=16K] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee0f00c Data: 4172 Capabilities: [e0] Express (v1) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- Capabilities: [a0] MSI-X: Enable- Count=5 Masked- Vector table: BAR=3 offset=00000000 PBA: BAR=3 offset=00002000 Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+ RxOF+ MalfTLP+ ECRC- UnsupReq+ ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [140 v1] Device Serial Number 00-1e-67-ff-ff-XX-XX-XX Kernel driver in use: e1000e --- None of the used kernels like 3.3.8, 3.4.32 or 3.7.9 worked with the shipped e1000e drivers. Only one of the two network cards is attached to the switch. Changing to the other network connector, flow control settings or link speed settings doesn't solved the problem. Do You think that the problem can occur when the other Intel "e1000" driver is also loaded on the machine? Greets Lars -------------------------------------------------------------------------- Subject: RE: e1000e detected hardware unit hang problem (04-Mrz-2013 23:03) From: Dave, Tushar N <tushar.n.d...@intel.com> To: lm Lars, Sorry that you have issue with board. Please always send your email to or CC to e1000-devel@lists.sourceforge.net. What is the device? lspci vvv (after issue occurs) Send full dmesg log after issue occurs. How quick does the issue occurs? Any reproduction steps? Was it ever working before with any good known driver/kernel version? -Tushar From: Lars Maschke Sent: Saturday, March 02, 2013 11:18 PM To: Dave, Tushar N Subject: e1000e detected hardware unit hang problem Dear Tushar, I saw in some forums regarding our problem that You develop the e1000e driver. We have big trouble with the network chips on our S1200BLT mainboard. Every two or three days we get the "detected hardware unit hang" failure. The complete server is unreachable and there's no possiblity to log in on the console. Our system is debian with kernel 3.7.9. I've also installed the last driver from intel.com as You see here: driver: e1000e version: 2.2.14-NAPI firmware-version: 2.1-0 bus-info: 0000:03:00.0 That's what I tried: Kernel 3.3.8 Kernel appends: "pci=nomsi","pcie_aspm=off" We have other server boards like S5520 or S3420 which make no trouble. Can You tell me if there is any chance to get my server working correctly? Best Regards Lars Maschke To: tushar.n.d...@intel.com Cc: e1000-devel@lists.sourceforge.net ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired