Dear folks,

We are installing a large diskless cluster using CentOS 5.1. The hardware is pretty new - Supermicro X7DWT boards with Harpertown CPUs. Unfortunately we have some PXE-related problems described by the following scenario: 1) Set up DHCP, TFTP and NFS on a server, prepare PXE kernel and initrd - fine.
2) Start up the node using PXE for the first time - fine.
3) Reboot the node - PXE boot fails for all next attempts. We see that a server gets DHCP requests and answers them, but a node doesn't response with DHCP ack. The typical DHCP log is:
Jan  5 09:14:34 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via eth1
Jan 5 09:14:34 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to 00:30:48:7e:24:a6 via eth1
Jan  5 09:14:36 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via eth1
Jan 5 09:14:36 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to 00:30:48:7e:24:a6 via eth1
Jan  5 09:14:40 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via eth1
Jan 5 09:14:40 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to 00:30:48:7e:24:a6 via eth1
Jan  5 09:14:48 shoffner dhcpd: DHCPDISCOVER from 00:30:48:7e:24:a6 via eth1
Jan 5 09:14:48 shoffner dhcpd: DHCPOFFER on 10.1.5.2 to 00:30:48:7e:24:a6 via eth1 4) Anything like DHCP server restart, node reset, node power on/off doesn't help 5) The only thing that will enable system to boot again over PXE is to perform "bmc reset cold" command on a node using ipmitool - yes, we have IPMI card sharing the same Ethernet interface. After that we can boot CentOS again. 6) When Linux is loaded, if we reboot a node using "bmc power cycle" instead of reboot or shutdown, a node will boot for the next time without problems
7) There are no problems with a second GbE interface (without IPMI)
8) So our guess is that Linux on a reboot leaves Ethernet device in some state that cause brain damage for IPMI+PXE combination. We tried to play with some e1000 driver options, we are also tried latest Intel driver - nothing helps. Do you have any idea what goes wrong? Any help will be much appreciated. Below there is a system summary:

[EMAIL PROTECTED] ~]# uname -a
Linux node-05-03 2.6.18-53.1.4.el5 #1 SMP Fri Nov 30 00:45:55 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

[EMAIL PROTECTED] ~]# lspci
00:00.0 Host bridge: Intel Corporation Memory Controller Hub (rev 20)
00:01.0 PCI bridge: Intel Corporation PCI Express Port 1 (rev 20)
00:05.0 PCI bridge: Intel Corporation PCI Express Port 5 (rev 20)
00:07.0 PCI bridge: Intel Corporation PCI Express Port 7 (rev 20)
00:0f.0 System peripheral: Intel Corporation DMA/DCA Engine (rev 20)
00:10.0 Host bridge: Intel Corporation FSB Registers (rev 20)
00:10.1 Host bridge: Intel Corporation FSB Registers (rev 20)
00:10.2 Host bridge: Intel Corporation FSB Registers (rev 20)
00:10.3 Host bridge: Intel Corporation FSB Registers (rev 20)
00:10.4 Host bridge: Intel Corporation FSB Registers (rev 20)
00:11.0 Host bridge: Intel Corporation Unknown device 4031 (rev 20)
00:15.0 Host bridge: Intel Corporation FBD Registers (rev 20)
00:15.1 Host bridge: Intel Corporation FBD Registers (rev 20)
00:16.0 Host bridge: Intel Corporation FBD Registers (rev 20)
00:16.1 Host bridge: Intel Corporation FBD Registers (rev 20)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09) 00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09) 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller (rev 09) 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09)
01:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR] (rev a0)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01) 02:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01) 03:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01) 03:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01) 05:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 05:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
08:01.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02)

Thanks in advance,
Andrey
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Reply via email to