(couldn't decide on top-post or down-post her so I deleted the whole original message)
We have just upgraded our Rocks cluster to use the CentOS 5.5 rpms and it includes a complete OFED stack (v1.4.2?) so we decided to just ditch our own self compiled version of OFED 1.4.1. We then ran into the same problems with openibd hanging on shutdown. After a futile attempt trying to inject a lustre-unload-modules service between netfs and openib to run lustre_rmmod. I tried to hack modprobe.conf to eject the lustre modules by inserting this remove rdma_cm /usr/sbin/lustre_rmmod && /sbin/modprobe -r --ignore-remove rdma_cm this didn't work either because the openibd service script use rmmod instead of modprobe -r (aargghh). So, the solution that seems to work is to disable openibd (chkconfig openibd off) and let the network initialization take care of loading the right modules by putting this into modprobe.conf: alias ib0 ib_ipoib install ib_ipoib modprobe mlx4_ib && /sbin/modprobe --ignore-install ib_ipoib Then network startup will load the right ib modules and the netfs service will automatically load the lustre modules when mounting the lustre partitions. The downside might be that you will not get any clean unload of neither the lustre nor the ofed modules on shutdown/reboot. If you run other hw than us you might have to change the mlx4_ib module with whatever you need. (wasted two days on this, sometimes I make really good use of taxpayers money...) r. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
