Hi Folks,

We are trying to create a small lustre environment on behalf of a customer. There are 2 X4200m2 MDS servers, both dual-attached to an STK 6140 array over FC. This is an active-passive arrangement with a single shared volume. Heartbeat is used to co-ordinate file system failover. There is a single X4500 OSS server, the storage for which is split into 6 OSTs. Finally, we have 2 X4600m2 clients, just for kicks.

All systems are connected together over ethernet and infiniband, with the IB network being used for Lustre and every system is running RHEL 4.5 AS. The X4500 OST volumes are created using software RAID, while the X4200m2 MDT is accessed using DM Multipath. We downloaded the Lustre binary packages from SUN's web site and installed them onto each of the servers.

Unfortunately, the resulting system is very unstable and is prone to lock-ups on the servers (uptimes are measured in hours). These lock-ups happen without warning, and with very little, if any, debug information in the system logs. We have also observed the servers locking up on shutdown (kernel panics). Based on the documentation in the Lustre operations manual, we installed the RPMs as follows:

rpm -Uvh --force e2fsprogs-1.40.7.sun3-0redhat.x86_64.rpm
rpm -ivh kernel-lustre-smp-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm
rpm -ivh kernel-lustre-source-2.6.9-67.0.7.EL_lustre.1.6.5.1.x86_64.rpm
rpm -ivh lustre-modules-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm # (many "unknown symbol" warnings)
rpm -ivh lustre-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm
rpm -ivh lustre-source-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm
rpm -ivh lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm # (many "unknown symbol" warnings)
mv /etc/init.d/openibd /etc/init.d/openibd.rhel4default
rpm -ivh --force kernel-ib-1.3-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm
cp /etc/init.d/openibd /etc/init.d/openibd.lustre.1.6.5.1

We then reboot the system and load RHEL using the Lustre kernel. Now we install the Voltaire OFED software:
  1. Copy the kernel config used to build the Lustre patched kernel into the Lustre kernel source tree:

    cp /boot/config-2.6.9-67.0.7.EL_lustre.1.6.5.1smp \
    /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1/.config

  2. Change into the Lustre kernel source and edit the Makefile. Change "custom" suffix to "smp" in the variable "EXTRAVERSION".
  3. Change into the lustre kernel source and run these setup commands:

    make oldconfig || make menuconfig
    make include/asm
    make include/linux/version.h
    make SUBDIRS=scripts


  4. Change into the "-obj" directory and run these setup commands:

    cd /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1-obj/x86_64/smp
    ln -s /usr/src/linux-2.6.9-67.0.7.EL_lustre.1.6.5.1/include .

  5. Unpack the Voltaire OFED tar-ball:

    tar zxf VoltaireOFED-5.1.3.1_5.tgz

  6. Change to the unpacked software directory and run the installation script. To build the OFED packages with the Voltaire certified configuration, run the following commands:

    cd VoltaireOFED-5.1.3.1_5
    ./install.pl -c ofed.conf.Volt


  7. Once complete, reboot.
  8. Configure any IPoIB interfaces as required.
  9. Add the following into /etc/modprobe.conf:

    options lnet networks="o2ib0(ib0)"

  10. Load the Lustre LNET kernel module.

    modprobe lnet

  11. Start the Lustre core networking service.

    lctl network up

  12. Check the system log (/var/log/messages) for confirmation.

Create the MGS/MDT Lustre Volume:
  1. Format the MGS/MDT device.

    mkfs.lustre [ --reformat ] --fsname lfs01 --mdt --mgs [EMAIL PROTECTED] /dev/dm-0

  2. Create the MGS/MDT file system mount point.

    mkdir -p /lustre/mdt/lfs01

  3. Mount the file system. This will initiate MGS and MDT services for Lustre.

    mount -t lustre /dev/dm-0 /lustre/mdt/lfs01
With the exception of the OST volume creation, we use an equivalent process to bring the OSS online.

The cabling has been checked and verified. So we re-built the system from scratch and applied only SUN's RDAC modules and Voltaire OFED to the stock RHEL 4.5 kernel (2.6.9-55.ELsmp). We removed the second MDS from the h/w configuration and did not install Heartbeat. The shared storage was re-formatted as a regular EXT3 file system using the DM multipathing device, /dev/dm-0, and mounted onto the host. Running I/O tests onto the mounted file system over an extended period did not elicit a single error or warning message in the log related to the multipathing or the SCSI device.

Once we were confident that the system was running in a consistent and stable manner, we re-installed the Lustre packages, omitting the kernel-ib packages. We had to re-build and re-install the RDAC support as well. This means that the system has support for the Lustre file system but no infiniband support at all. /etc/modprobe.conf is updated such that the lnet networks option is set to "tcp". The MDS/MGS volume is recreated on the DM device.

We have tried the following configurations on the X4200m2:
  • RHEL vanilla kernel, multipathd, RDAC. EXT-3 file system. PASSED.
  • RHEL vanilla kernel, multipathd, RDAC, Voltaire OFED. EXT-3 file system. PASSED.
  • Lustre supplied kernel, Lustre software. No IB. MDS/MGS file system. FAILED.
  • Lustre supplied kernel, Lustre software, RDAC. No IB. MDS/MGS file system (Full Lustre FS over Ethernet). FAILED.
  • Lustre supplied kernel, Lustre software, RDAC, Voltaire OFED. EXT-3 file system. FAILED.
  • Lustre supplied kernel, Lustre software. RDAC, Voltaire OFED. MDS/MGS file system (Full Lustre FS over IB). FAILED.
Our findings indicate that there is a problem within the binary distribution of Lustre. This may be due to the fact that we are applying the 2.6.9-67 RHEL kernel to a platform based upon 2.6.9.-55, or it may be a more subtle issue based on the interaction with the underlying hardware. We could use some advice on how best to proceed, since our deadline fast approaches. For example, is our build process, as documented above, clean? Currently, we're looking at building from source, to see if this results in a more stable environment.

Regards,

Malcolm.

--

Malcolm Cowe
Solutions Integration Engineer

Sun Microsystems, Inc.
Blackness Road
Linlithgow, West Lothian EH49 7LR UK
Phone: x73602 / +44 1506 673 602
Email: [EMAIL PROTECTED]
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to