Hey Brian,

I'll have to re-install the system from scratch in order to be able to answer some of your questions, which I'll get started on this evening. What I was hoping for in the first instance was a sanity check of our installation methods. With respect to the OFED stack used, we are using the latest official software stack supplied by Voltaire. The reason for this is that there is more to OFED than just the kernel modules, including many libraries and tools, plus the latest firmware for the cards. It's what the customer has asked for, and it is what the card vendor expects us to do.

We may be able to get away with OFED 1.3, but I would still like some guidance on how to install the rest of the OFED stack -- do we use the OFED source to rebuild everything, or can we pick the Lustre supplied kernel modules and just layer on the other stuff separately? Like I said, sanity-checking the install procedure is important.

Finally, when I said that one file system fails versus another passes, I mean that the server locks solid, crashes, usually with no debug to speak of (nothing in the system logs). Even while the system is up and running the lustre kernel, if we attempt a clean shutdown, the kernel panics.

Since I need to rebuild the systems anyway, I will also try to install the packages in the order mentioned by Megan Larko, to see how that affects the installation. We have been following the instructions in the Lustre Operations Manual (v. 1.14).

Regards,

Malcolm.


Brian J. Murrell wrote:
On Mon, 2008-10-06 at 10:58 +0100, Malcolm Cowe wrote:
  
rpm -Uvh --force e2fsprogs-1.40.7.sun3-0redhat.x86_64.rpm
    

You should not (have to) use --force.  If you do, there is either an
operational error or a bug in our packages.  In the latter case, please
file a bug in our bugzilla.

  
rpm -ivh
lustre-modules-1.6.5.1-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm #
(many "unknown symbol" warnings)
    

Can you paste them here?

  
rpm -ivh
lustre-ldiskfs-3.0.4-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm #
(many "unknown symbol" warnings)
    

Ditto.

  
rpm -ivh --force
kernel-ib-1.3-2.6.9_67.0.7.EL_lustre.1.6.5.1smp.x86_64.rpm 
    

Again, you should not need to use --force.

  
We then reboot the system and load RHEL using the Lustre kernel. Now
we install the Voltaire OFED software:
    

Why?  The kernel-ib package you installed above should provide a working
OFED stack.

  
     1. Unpack the Voltaire OFED tar-ball:
        
        tar zxf VoltaireOFED-5.1.3.1_5.tgz
    

Do you really need 1.3.1?  If so, then you should not install the 1.3
kernel-ib package we provide above.  I really wonder why you need 1.3.1
though.

  
      * Lustre supplied kernel, Lustre software. No IB. MDS/MGS file
        system. FAILED.
    

Failed in what way?

  
      * Lustre supplied kernel, Lustre software, RDAC. No IB. MDS/MGS
        file system (Full Lustre FS over Ethernet). FAILED.
    

Again, in what way?

  
      * Lustre supplied kernel, Lustre software, RDAC, Voltaire OFED.
        EXT-3 file system. FAILED.
    

Ditto.

  
      * Lustre supplied kernel, Lustre software. RDAC, Voltaire OFED.
        MDS/MGS file system (Full Lustre FS over IB). FAILED.
    

And Ditto again.

You have to provide more details than just "FAILED" if we are to try to
help diagnose a problem.

  
Our findings indicate that there is a problem within the binary
distribution of Lustre.
    

I think that many of our users use it as is, so it cannot be all that
bad.

  
This may be due to the fact that we are applying the 2.6.9-67 RHEL
kernel to a platform based upon 2.6.9.-55,
    

That shouldn't be a problem in and of itself.

b.

  

_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss

--

Malcolm Cowe
Solutions Integration Engineer

Sun Microsystems, Inc.
Blackness Road
Linlithgow, West Lothian EH49 7LR UK
Phone: x73602 / +44 1506 673 602
Email: [EMAIL PROTECTED]
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to