You must rebuild Lustre if you replace OFED.
Kevin
On Jun 9, 2011, at 4:55 PM, Edward Walter <[email protected]> wrote:
Thanks for all of the advice here. We seem to be running into a
hiccup using Lustre 1.8.4 with O2IB and OFED 1.5.1
First of all, our lustre servers are all up and running fine (using
the vendor OFED - 1.4.1). Our trouble is all client side.
We want to use a newer OFED (1.5.1) to potentially enable NFS
over RDMA (we have NFS servers in addition to lustre).
We installed the current Lustre 1.8.4 rpms from Sun/Oracle:
kernel-2.6.18-194.3.1.el5_lustre.1.8.4
lustre-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
lustre-modules-1.8.4-2.6.18_194.3.1.el5_lustre.1.8.4
kernel-devel-2.6.18-194.3.1.el5_lustre.1.8.4
kernel-headers-2.6.18-194.3.1.el5_lustre.1.8.4
We rebooted with kernel-2.6.18-194.3.1.el5_lustre.1.8.4.
Next we downloaded the OFED 1.5.1 sources and built the basic and
hpc packages. These built and installed without incident. I don't
believe Open Fabrics group provides binary RPMS. Otherwise; we
would have used them.
Here are the lustre/IB lines from our modprobe.conf:
alias ib0 ib_ipoib
alias net-pf-27 ib_sdp
options lnet networks=o2ib
And our fstab:
172.16.1.3@o2ib:172.16.1.4@o2ib:/data /
lustre lustre defaults,_netdev,localflock 0 0
OpenIB is working properly, we have a subnet manager running and can
ping our Lustre OSS and MDS servers over IB.
Trying to mount /lustre generates the following error:
mount.lustre: mount 172.16.1.3@o2ib:172.16.1.4@o2ib:/data at /
lustre failed: No such device
Are the lustre modules loaded?
Check /etc/modprobe.conf and /proc/filesystems
Note 'alias lustre llite' should be removed from modprobe.conf
dmesg shows that the ko2iblnd module cannot be loaded:
Lustre: OBD class driver, http://www.lustre.org/
Lustre: Lustre Version: 1.8.4
Lustre: Build Version: 1.8.4-20100723170646-
PRISTINE-2.6.18-194.3.1.el5_lustre.1.8.4
ko2iblnd: disagrees about version of symbol ib_fmr_pool_unmap
ko2iblnd: Unknown symbol ib_fmr_pool_unmap
ko2iblnd: disagrees about version of symbol ib_create_cq
ko2iblnd: Unknown symbol ib_create_cq
ko2iblnd: disagrees about version of symbol rdma_resolve_addr
ko2iblnd: Unknown symbol rdma_resolve_addr
ko2iblnd: disagrees about version of symbol ib_reg_phys_mr
ko2iblnd: Unknown symbol ib_reg_phys_mr
ko2iblnd: disagrees about version of symbol ib_create_fmr_pool
ko2iblnd: Unknown symbol ib_create_fmr_pool
ko2iblnd: disagrees about version of symbol ib_dereg_mr
ko2iblnd: Unknown symbol ib_dereg_mr
ko2iblnd: disagrees about version of symbol rdma_reject
ko2iblnd: Unknown symbol rdma_reject
ko2iblnd: disagrees about version of symbol rdma_disconnect
ko2iblnd: Unknown symbol rdma_disconnect
ko2iblnd: disagrees about version of symbol rdma_resolve_route
ko2iblnd: Unknown symbol rdma_resolve_route
ko2iblnd: disagrees about version of symbol rdma_bind_addr
ko2iblnd: Unknown symbol rdma_bind_addr
ko2iblnd: disagrees about version of symbol rdma_create_qp
ko2iblnd: Unknown symbol rdma_create_qp
ko2iblnd: disagrees about version of symbol ib_destroy_cq
ko2iblnd: Unknown symbol ib_destroy_cq
ko2iblnd: disagrees about version of symbol rdma_create_id
ko2iblnd: Unknown symbol rdma_create_id
ko2iblnd: disagrees about version of symbol rdma_listen
ko2iblnd: Unknown symbol rdma_listen
ko2iblnd: disagrees about version of symbol rdma_destroy_qp
ko2iblnd: Unknown symbol rdma_destroy_qp
ko2iblnd: disagrees about version of symbol ib_query_device
ko2iblnd: Unknown symbol ib_query_device
ko2iblnd: disagrees about version of symbol ib_get_dma_mr
ko2iblnd: Unknown symbol ib_get_dma_mr
ko2iblnd: disagrees about version of symbol ib_alloc_pd
ko2iblnd: Unknown symbol ib_alloc_pd
ko2iblnd: disagrees about version of symbol rdma_connect
ko2iblnd: Unknown symbol rdma_connect
ko2iblnd: disagrees about version of symbol ib_modify_qp
ko2iblnd: Unknown symbol ib_modify_qp
ko2iblnd: disagrees about version of symbol rdma_destroy_id
ko2iblnd: Unknown symbol rdma_destroy_id
ko2iblnd: disagrees about version of symbol rdma_accept
ko2iblnd: Unknown symbol rdma_accept
ko2iblnd: disagrees about version of symbol ib_dealloc_pd
ko2iblnd: Unknown symbol ib_dealloc_pd
ko2iblnd: disagrees about version of symbol ib_fmr_pool_map_phys
ko2iblnd: Unknown symbol ib_fmr_pool_map_phys
LustreError: 7461:0:(api-ni.c:1081:lnet_startup_lndnis())
Can't load LND o2ib, module ko2iblnd, rc=256
LustreError: 7461:0:(events.c:725:ptlrpc_init_portals()) network
initialisation failed
Am I missing something obvious here.
Thanks much.
-Ed
On 06/05/2011 05:48 AM, Wu, Yilei wrote:
we have being use OFED 1.5.1 with Lustre 1.8.4 nowadays on a 400
node Cluster, on basis of RHEL 5.4. It is no problem at all.
One thing need attention:
If using default OFED 1.5.1, just install with RPM package, no need
to build either Lustre or OFED.
If using revised driver, such as BX-OFED 1.5.1, in some cases,
users need to recompile linux kernel with increased stack size,
because lustre and ofed may use up stack (both are stack greedy)
and thus lead to system hang issue.
YiLei
On Thu, Jun 2, 2011 at 1:36 AM, Kevin Van Maren <[email protected]
> wrote:
OFED 1.5.1 should work fine with Lustre 1.8.4, although I believe
more
people are using the in-kernel OFED now: Lustre (finally) defaulted
to
the in-kernel OFED for RedHat, so it is no longer _necessary_ to
build
either OFED or Lustre.
Kevin
Edward Walter wrote:
> Hi List,
>
> We're getting ready to upgrade the OS/software stack on one of our
> clusters and I'm looking at which Lustre and OFED versions will
work best.
>
> It looks like the changelog for 1.8.4 and the compatibility
matrix have
> conflicting information.
>
> The Lustre compatibility matrix indicates that on Lustre 1.8.4; the
> highest OFED revision with o2iblnd support is 1.4.2:
> http://wiki.lustre.org/index.php/Lustre_Release_Information
>
> The changelog for 1.8.4 indicates that o2iblnd is supported with
OFED 1.5.1:
> http://wiki.lustre.org/index.php/Change_Log_1.8#Changes_from_v1.8.3_to_v1.8.4
>
>
> Can someone clarify whether 1.8.4 supports o2iblnd with OFED
1.5.1? Are
> there any pitfalls to this configuration? Has anyone found any
> instabilities with this configuration?
>
> Thanks much.
>
> -Ed Walter
> Carnegie Mellon University
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss