|
Hi Folks, We are trying to create a small lustre environment on behalf of a customer. There are 2 X4200m2 MDS servers, both dual-attached to an STK 6140 array over FC. This is an active-passive arrangement with a single shared volume. Heartbeat is used to co-ordinate file system failover. There is a single X4500 OSS server, the storage for which is split into 6 OSTs. Finally, we have 2 X4600m2 clients, just for kicks. All systems are connected together over ethernet and infiniband, with the IB network being used for Lustre and every system is running RHEL 4.5 AS. The X4500 OST volumes are created using software RAID, while the X4200m2 MDT is accessed using DM Multipath. We downloaded the Lustre binary packages from SUN's web site and installed them onto each of the servers. Unfortunately, the resulting system is very unstable and is prone to lock-ups on the servers (uptimes are measured in hours). These lock-ups happen without warning, and with very little, if any, debug information in the system logs. We have also observed the servers locking up on shutdown (kernel panics). Based on the documentation in the Lustre operations manual, we installed the RPMs as follows: rpm -Uvh --force e2fsprogs-1.40.7.sun3-0redhat.x86_64.rpm rpm -ivh kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm rpm -ivh kernel-lustre-source-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm rpm -ivh lustre-modules-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm # (many "unknown symbol" warnings) rpm -ivh lustre-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm rpm -ivh lustre-source-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm rpm -ivh lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm # (many "unknown symbol" warnings) mv /etc/init.d/openibd /etc/init.d/openibd.rhel4default rpm -ivh --force kernel-ib-1.3-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm cp /etc/init.d/openibd /etc/init.d/openibd.lustre.1.6.5.1 We then reboot the system and load RHEL using the Lustre kernel. Now we install the Voltaire OFED software:
Create the MGS/MDT Lustre Volume:
The cabling has been checked and verified. So we re-built the system from scratch and applied only SUN's RDAC modules and Voltaire OFED to the stock RHEL 4.5 kernel (2.6.9-55.ELsmp). We removed the second MDS from the h/w configuration and did not install Heartbeat. The shared storage was re-formatted as a regular EXT3 file system using the DM multipathing device, /dev/dm-0, and mounted onto the host. Running I/O tests onto the mounted file system over an extended period did not elicit a single error or warning message in the log related to the multipathing or the SCSI device. Once we were confident that the system was running in a consistent and stable manner, we re-installed the Lustre packages, omitting the kernel-ib packages. We had to re-build and re-install the RDAC support as well. This means that the system has support for the Lustre file system but no infiniband support at all. /etc/modprobe.conf is updated such that the lnet networks option is set to "tcp". The MDS/MGS volume is recreated on the DM device. We have tried the following configurations on the X4200m2:
Regards, Malcolm. --
|
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss

