В Чтв, 25/02/2010 в 16:19 +0200, Покотиленко Костик пишет:
> Hi,
>
> We've switched back from 82576 to 82574L+82578DM. 82576 is still there
> for a while before I put it into another server for testing.
Well, I don't believe we have problems with 82576 as well as 82574L and
82578DM. Configuration with 82574L+82578DM has rebooted with 5 hour
uptime.
Also there was a mention in bugreport that "pcie_aspm=off" helped some
people with 82574L issues, but not for me.
As to me this is defenitelly motherboard/irq/dma/pci-e/driver problem.
I'm lost on how to debug further...
Info on the case:
kern.log doesn't show any errors, but alot of on minicom serial console
which probably means SATA has also died and is supporting my guess about
motherboard:
====================================================================
[19534.843829] 0000:00:19.0: eth0: Detected Hardware Unit Hang:
[19534.843831] TDH <fe>
[19534.843831] TDT <f>
[19534.843832] next_to_use <f>
[19534.843833] next_to_clean <fe>
[19534.843833] buffer_info[next_to_clean]:
[19534.843834] time_stamp <497af8>
[19534.843835] next_to_watch <fe>
[19534.843835] jiffies <497c7d>
[19534.843836] next_to_watch.status <0>
[19534.843837] MAC Status <40080283>
[19534.843837] PHY Status <796d>
[19534.843838] PHY 1000BASE-T Status <3800>
[19534.843839] PHY Extended Status <2000>
[19534.843839] PCI Status <10>
Feb 25 19:55:23 lan-r kernel: [19534.843829] 0000:00:19.0: eth0:
Detected Hardware Unit Hang:
Feb 25 19:55:23 lan-r kernel: [19534.843831] TDH <fe>
Feb 25 19:55:23 lan-r kernel: [19534.843831] TDT <f>
Feb 25 19:55:23 lan-r kernel: [19534.843832] next_to_use <f>
Feb 25 19:55:23 lan-r kernel: [19534.843833] next_to_clean <fe>
Feb 25 19:55:23 lan-r kernel: [19534.84[19535.852413] 0000:0a:00.0:
eth1: Detected Hardware Unit Hang:
[19535.852414] TDH <c5>
[19535.852415] TDT <4>
[19535.852416] next_to_use <4>
[19535.852416] next_to_clean <c5>
[19535.852417] buffer_info[next_to_clean]:
[19535.852417] time_stamp <497af8>
[19535.852418] next_to_watch <c5>
[19535.852419] jiffies <497d7a>
[19535.852420] next_to_watch.status <0>
[19535.852420] MAC Status <80783>
[19535.852421] PHY Status <796d>
[19535.852422] PHY 1000BASE-T Status <7800>
[19535.852422] PHY Extended Status <3000>
[19535.852423] PCI Status <10>
3833] buffer_info[next_to_clean]:
Feb 25 19:55:23 lan-r kernel: [19534.843834] time_stamp
<497af8>
Feb 25 19:55:23 lan-r kernel: [19534.843835] next_to_watch <fe>
Feb 25 19:55:23 lan-r kernel: [19534.843835] jiffies
<497c7d>
Feb 25 19:55:23 lan-r kernel: [19534.843836] next_to_watch.status <0>
Feb 25 19:55:23 lan-r kernel: [19534.843837] MAC Status
<40080283>
Feb 25 19:55:23 lan-r kernel: [19534.843837] PHY Status
<796d>
Feb 25 19:55:23 lan-r kernel: [19534.843838] PHY 1000BASE-T Status
<3800>
Feb 25 19:55:23 lan-r kernel: [19534.843839] PHY Extended Status
<2000>
Feb 25 19:55:23 lan-r kernel: [19534.843839] PCI Status <10>
Feb 25 19:55:24 lan-r kernel: [19535.852413] 0000:0a:00.0: eth1:
Detected Hardware Unit Hang:
Feb 25 19:55:24 lan-r kernel: [19535.852414] TDH <c5>
Feb 25 19:55:24 lan-r kernel: [19535.852415] TDT <4>
Feb 25 19:55:24 lan-r kernel: [19535.852416] next_to_use <4>
Feb 25 19:55:24 lan-r kernel: [19535.852416] next_to_clean
[19536.840763] 0000:00:19.0: eth0: Detected Hardware Unit Hang:
<c5>
Feb 25 [19536.840765] TDH <fe>
19:55:24 lan-r k[19536.840765] TDT <f>
ernel: [19535.85[19536.840766] next_to_use <f>
2417] buffer_inf[19536.840767] next_to_clean <fe>
o[next_to_clean][19536.840767] buffer_info[next_to_clean]:
:
Feb 25 19:55:[19536.840768] time_stamp <497af8>
24 lan-r kernel:[19536.840769] next_to_watch <fe>
[19535.852417] [19536.840769] jiffies <497e71>
time_stamp [19536.840770] next_to_watch.status <0>
[19536.840771] MAC Status <40080283>
Feb 25 19:55:24[19536.840771] PHY Status <796d>
lan-r kernel: [[19536.840772] PHY 1000BASE-T Status <3800>
19535.852418] [19536.840773] PHY Extended Status <2000>
next_to_watch [19536.840773] PCI Status <10>
<c5>
Feb 25 19:55:24 lan-r kernel: [19535.852419] jiffies
<497d7a>
Feb 25 19:55:24 lan-r kernel: [19535.852420] next_to_watch.status <0>
Feb 25 19:55:24 lan-r kernel: [19535.852420] MAC Status
<80783>
Feb 25 19:55:24 lan-r kernel: [19535.852421] PHY Status
<796d>
Feb 25 19:55:24 lan-r kernel: [19535.852422] PHY 1000BASE-T Status
<7800>
Feb 25 19:55:24 lan-r kernel: [19535.852422] PHY Extended Status
<3000>
Feb 25 19:55:24 lan-r kernel: [19535.852423] PCI Status <10>
Feb 25 19:55:25 lan-r kernel: [19536.840763] 0000:00:19.0: eth0:
Detected Hardware Unit Hang:
Feb 25 19:55:25 lan-r kernel: [19536.840765] TDH <fe>
Feb 25 19:55:25 lan-r kernel: [19536.840765] TDT <f>
Feb 25 19:55:25 lan-r kernel: [19536.840766] next_to_use <f>
Feb 25 19:55:25 lan-r kernel: [19536.840767] next_to_clean <fe>
Feb 25 19:55:25 lan-r kernel: [19536.84[19537.849589] 0000:0a:00.0:
eth1: Detected Hardware Unit Hang:
[19537.849590] TDH <c5>
[19537.849591] TDT <4>
[19537.849591] next_to_use <4>
[19537.849592] next_to_clean <c5>
[19537.849593] buffer_info[next_to_clean]:
[19537.849593] time_stamp <497af8>
[19537.849594] next_to_watch <c5>
[19537.849595] jiffies <497f6e>
[19537.849595] next_to_watch.status <0>
[19537.849596] MAC Status <80783>
[19537.849597] PHY Status <796d>
[19537.849597] PHY 1000BASE-T Status <7800>
[19537.849598] PHY Extended Status <3000>
[19537.849599] PCI Status <18>
0767] buffer_info[next_to_clean]:
Feb 25 19:55:25 lan-r kernel: [19536.840768] time_stamp
<497af8>
Feb 25 19:55:25 lan-r kernel: [19536.840769] next_to_watch <fe>
Feb 25 19:55:25 lan-r kernel: [19536.840769] jiffies
<497e71>
Feb 25 19:55:25 lan-r kernel: [19536.840770] next_to_watch.status <0>
Feb 25 19:55:25 lan-r kernel: [19536.840771] MAC Status
<40080283>
Feb 25 19:55:25 lan-r kernel: [19536.840771] PHY Status
<796d>
Feb 25 19:55:25 lan-r kernel: [19536.840772] PHY 1000BASE-T Status
<3800>
Feb 25 19:55:25 lan-r kernel: [19536.840773] PHY Extended Status
<2000>
Feb 25 19:55:25 lan-r kernel: [19536.840773] PCI Status <10>
[19538.426900] irq 46: nobody cared (try booting with the "irqpoll"
option)
[19538.448968] handlers:
[19538.456440] [<f82799b0>] (e1000_msix_other+0x0/0xa0 [e1000e])
[19538.475380] Disabling IRQ #46
lan-r kernel: [19538.475380] Disabling IRQ #46
Feb 25 19:55:26 lan-r kernel: [19537.849589] 0000:0a:00.0: eth1:
Detected Hardware Unit Hang:
Feb 25 19:55:26 lan-r kernel: [19537.849590] TDH <c5>
Feb 25 19:55:26 lan-r kernel: [19537.849591] TDT <4>
Feb 25 19:55:26 lan-r kernel: [19537.849591] next_to_use <4>
Feb 25 19:55:26 lan-r kernel: [19537.849592] next_to_clean
[19538.837630] 0000:00:19.0: eth0: Detected Hardware Unit Hang:
<c5>
Feb 25 [19538.837631] TDH <fe>
19:55:26 lan-r k[19538.837632] TDT <f>
ernel: [19537.84[19538.837633] next_to_use <f>
9593] buffer_inf[19538.837634] next_to_clean <fe>
o[next_to_clean][19538.837634] buffer_info[next_to_clean]:
:
Feb 25 19:55:[19538.837635] time_stamp <497af8>
26 lan-r kernel:[19538.837636] next_to_watch <fe>
[19537.849593] [19538.837636] jiffies <498065>
time_stamp [19538.837637] next_to_watch.status <0>
[19538.837638] MAC Status <40080283>
Feb 25 19:55:26[19538.837638] PHY Status <796d>
lan-r kernel: [[19538.837639] PHY 1000BASE-T Status <3800>
19537.849594] [19538.837640] PHY Extended Status <2000>
next_to_watch [19538.837640] PCI Status <10>
<c5>
Feb 25 19:55:26 lan-r kernel: [19537.849595] jiffies
<497f6e>
Feb 25 19:55:26 lan-r kernel: [19537.849595] next_to_watch.status <0>
Feb 25 19:55:26 lan-r kernel: [19537.849596] MAC Status
<80783>
Feb 25 19:55:26 lan-r kernel: [19537.849597] PHY Status
<796d>
Feb 25 19:55:26 lan-r kernel: [19537.849597] PHY 1000BASE-T Status
<7800>
Feb 25 19:55:26 lan-r kernel: [19537.849598] PHY Extended Status
<3000>
Feb 25 19:55:26 lan-r kernel: [19537.849599] PCI Status <18>
Feb 25 19:55:27 lan-r kernel: [19538.426900] irq 46: nobody cared (try
booting with the "irqpoll" option)
Feb 25 19:55:27 lan-r kernel: [19538.448932] Pid: 0, comm: swapper Not
tainted 2.6.32.imq.ipset.esfq.01 #1
Feb 25 19:55:27 lan-r kernel: [19538.448934] Call Trace:
Feb 25 19:55:27 lan-r kernel: [19538.448939] [<c10693d4>] ?
__report_bad_irq+0x24/0x69
Feb 25 19:55:27 lan-r kernel: [19538.448942] [<c10693db>] ?
__report_bad_irq+0x2b/0x69
Feb 25 19:55:27 lan-r kernel: [19538.448945] [<c1069500>] ?
note_interrupt+0xe7/0x13f
Feb 25 19:55:27 lan-r kernel: [19538.448947] [<c106998d>] ?
handle_edge_irq+0xbe/0xe6
Feb 25 19:55:27 lan-r kernel: [19538.448951] [<c1004b9f>] ? handle_irq
+0x17/0x1b
Feb 25 19:55:27 lan-r kernel: [19538.448954] [<c1004421>] ? do_IRQ
+0x38/0x89
Feb 25 19:55:27 lan-r kernel: [19538.448956] [<c1003070>] ?
common_interrupt+0x30/0x38
Waiting for data... (interrupt to abort)[19539.846252] 0000:0a:00.0:
eth1: Detected Hardware Unit Hang:
[19539.846253] TDH <c5>
[19539.846254] TDT <4>
[19539.846254] next_to_use <4>
[19539.846255] next_to_clean <c5>
[19539.846256] buffer_info[next_to_clean]:
[19539.846256] time_stamp <497af8>
[19539.846257] next_to_watch <c5>
[19539.846258] jiffies <498162>
[19539.846258] next_to_watch.status <0>
[19539.846259] MAC Status <80783>
[19539.846260] PHY Status <796d>
[19539.846260] PHY 1000BASE-T Status <7800>
[19539.846261] PHY Extended Status <3000>
[19539.846262] PCI Status <10>
[19540.834735] 0000:00:19.0: eth0: Detected Hardware Unit Hang:
[19540.834736] TDH <fe>
[19540.834737] TDT <f>
[19540.834738] next_to_use <f>
[19540.834738] next_to_clean <fe>
[19540.834739] buffer_info[next_to_clean]:
[19540.834740] time_stamp <497af8>
[19540.834740] next_to_watch <fe>
[19540.834741] jiffies <498259>
[19540.834742] next_to_watch.status <0>
[19540.834742] MAC Status <40080283>
[19540.834743] PHY Status <796d>
[19540.834744] PHY 1000BASE-T Status <3800>
[19540.834744] PHY Extended Status <2000>
[19540.834745] PCI Status <10>
[19541.843158] 0000:0a:00.0: eth1: Detected Hardware Unit Hang:
[19541.843159] TDH <c5>
Booting 'Debian GNU/Linux, kernel 2.6.32.imq.ipset.esfq.01'
root (hd0,0)
Filesystem type is ext2fs, partition type 0x83
kernel /boot/vmlinuz-2.6.32.imq.ipset.esfq.01 root=label=r...@lan-r
console=tt
y0 console=ttyS0,38400 pcie_aspm=off ro
[Linux-bzImage, setup=0x3600, size=0x20be80]
initrd /boot/initrd.img-2.6.32.imq.ipset.esfq.01
[Linux-initrd @ 0x37965000, 0x68a6d6 bytes]
===================================================================
BTW, I don't remember seeing "irq 46: nobody cared (try booting with the
"irqpoll" option)" messages last time we used 82574L+82578DM. Does it
wirth trying to boot with "irqpoll" option?
#uname -a
Linux lan-r 2.6.32.imq.ipset.esfq.01 #1 SMP Thu Feb 11 17:30:08 EET 2010
i686 GNU/Linux
# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 83 0 0 7 IO-APIC-edge
timer
3: 0 0 0 7 IO-APIC-edge
serial
4: 0 0 0 916 IO-APIC-edge
serial
8: 0 0 0 8 IO-APIC-edge
rtc0
9: 0 0 0 0 IO-APIC-fasteoi
acpi
14: 0 0 0 0 IO-APIC-edge
ide0
15: 0 0 0 0 IO-APIC-edge
ide1
18: 0 0 0 16314 IO-APIC-fasteoi
ata_piix
21: 0 28 0 0 IO-APIC-fasteoi
ehci_hcd:usb1
22: 0 0 0 0 IO-APIC-fasteoi
ata_piix
23: 0 0 69 0 IO-APIC-fasteoi
ehci_hcd:usb2
24: 751416 0 0 0 HPET_MSI-edge
hpet2
25: 0 12605250 0 0 HPET_MSI-edge
hpet3
26: 0 0 179257 0 HPET_MSI-edge
hpet4
27: 0 0 0 5374014 HPET_MSI-edge
hpet5
29: 0 0 0 0 PCI-MSI-edge
aerdrv
44: 0 9677646 0 0 PCI-MSI-edge
eth0
45: 0 0 0 7134650 PCI-MSI-edge
eth1-Q0
46: 767310 0 0 0 PCI-MSI-edge
eth1
NMI: 0 0 0 0 Non-maskable
interrupts
LOC: 324 274 352 287 Local timer
interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance
monitoring interrupts
PND: 0 0 0 0 Performance pending
work
RES: 1784 25443 1211 60097 Rescheduling
interrupts
CAL: 86 92 82 31 Function call
interrupts
TLB: 5498 3695 5237 4420 TLB shootdowns
TRM: 0 0 0 0 Thermal event
interrupts
THR: 0 0 0 0 Threshold APIC
interrupts
MCE: 0 0 0 0 Machine check
exceptions
MCP: 8 8 8 8 Machine check polls
ERR: 0
MIS: 0
# ethtool -i eth0
driver: e1000e
version: 1.1.2-NAPI
firmware-version: 0.9-2
bus-info: 0000:00:19.0
# ethtool -i eth1
driver: e1000e
version: 1.1.2-NAPI
firmware-version: 1.9-0
bus-info: 0000:0a:00.0
# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
large receive offload: off
# ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
large receive offload: off
# lspci
00:00.0 Host bridge: Intel Corporation Core Processor DMI (rev 11)
00:05.0 PCI bridge: Intel Corporation Core Processor PCI Express Root
Port 3 (rev 11)
00:08.0 System peripheral: Intel Corporation Core Processor System
Management Registers (rev 11)
00:08.1 System peripheral: Intel Corporation Core Processor Semaphore
and Scratchpad Registers (rev 11)
00:08.2 System peripheral: Intel Corporation Core Processor System
Control and Status Registers (rev 11)
00:08.3 System peripheral: Intel Corporation Core Processor
Miscellaneous Registers (rev 11)
00:10.0 System peripheral: Intel Corporation Core Processor QPI Link
(rev 11)
00:10.1 System peripheral: Intel Corporation Core Processor QPI Routing
and Protocol Registers (rev 11)
00:19.0 Ethernet controller: Intel Corporation 82578DM Gigabit Network
Connection (rev 05)
00:1a.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset
USB2 Enhanced Host Controller (rev 05)
00:1c.0 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI
Express Root Port 1 (rev 05)
00:1c.4 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI
Express Root Port 5 (rev 05)
00:1c.6 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI
Express Root Port 7 (rev 05)
00:1c.7 PCI bridge: Intel Corporation 5 Series/3400 Series Chipset PCI
Express Root Port 8 (rev 05)
00:1d.0 USB Controller: Intel Corporation 5 Series/3400 Series Chipset
USB2 Enhanced Host Controller (rev 05)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev a5)
00:1f.0 ISA bridge: Intel Corporation 3400 Series Chipset LPC Interface
Controller (rev 05)
00:1f.2 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 4
port SATA IDE Controller (rev 05)
00:1f.3 SMBus: Intel Corporation 5 Series/3400 Series Chipset SMBus
Controller (rev 05)
00:1f.5 IDE interface: Intel Corporation 5 Series/3400 Series Chipset 2
port SATA IDE Controller (rev 05)
01:00.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI
Express Switch (rev 0e)
02:02.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI
Express Switch (rev 0e)
02:04.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI
Express Switch (rev 0e)
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
06:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
06:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network
Connection (rev 01)
0a:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
Connection
0b:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G200e
[Pilot] ServerEngines (SEP1) (rev 02)
--
Покотиленко Костик <[email protected]>
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired