Hi all,

I've been attempting to build Lustre server RPMs against MOFED the past few days and keep hitting a dependency problem where the kmod-lustre packages have dependencies on various ksym symbols that are not being satisfied by the MOFED rpms available.

I'm building here with:
Lustre 2.12.5
kernel 3.10.0-1127.8.2.el7_lustre.x86_64
MOFED 4.9-0.1.7.0 (although I've had the same result with kernel 3.10.0-1127.18.2.el7_lustre.x86_64 and MOFED 5.1-0.6.6.0):

[user@machine lustre-release]# yum localinstall kmod-lustre-2.12.5-1.el7.x86_64.rpm kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64.rpm lustre-2.12.5-1.el7.x86_64.rpm lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64.rpm
Loaded plugins: product-id, search-disabled-repos, subscription-manager
Examining kmod-lustre-2.12.5-1.el7.x86_64.rpm: kmod-lustre-2.12.5-1.el7.x86_64
Marking kmod-lustre-2.12.5-1.el7.x86_64.rpm to be installed
Examining kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64.rpm: 
kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64
Marking kmod-lustre-osd-ldiskfs-2.12.5-1.el7.x86_64.rpm to be installed
Examining lustre-2.12.5-1.el7.x86_64.rpm: lustre-2.12.5-1.el7.x86_64
Marking lustre-2.12.5-1.el7.x86_64.rpm to be installed
Examining lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64.rpm: 
lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64
Marking lustre-osd-ldiskfs-mount-2.12.5-1.el7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package kmod-lustre.x86_64 0:2.12.5-1.el7 will be installed
--> Processing Dependency: ksym(__ib_alloc_pd) = 0x9cbf7973 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(__ib_create_cq) = 0x89e52306 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(__rdma_accept) = 0x8de99f59 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(__rdma_create_id) = 0xb4dc7b7e for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(backport_dependency_symbol) = 0xb43a926b for 
package: kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_alloc_mr_user) = 0x1fb7fcc9 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_create_fmr_pool) = 0x1f5667d3 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_dealloc_pd_user) = 0x534a2aa9 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_dereg_mr_user) = 0x02332dc6 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_destroy_cq_user) = 0x6391feb0 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_fmr_pool_map_phys) = 0xdcf9c30f for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_fmr_pool_unmap) = 0xd0481e41 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_get_dma_mr) = 0x366559bd for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_map_mr_sg) = 0x0366904f for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(ib_modify_qp) = 0x31adefba for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_bind_addr) = 0x445a242e for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_connect) = 0xaed8f42f for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_create_qp) = 0x247ddac2 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_destroy_id) = 0x7ea42958 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_destroy_qp) = 0xfa90a30a for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_disconnect) = 0x72109dd0 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_listen) = 0xff8db636 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_notify) = 0x7d20777a for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_reject) = 0x28d81cc0 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_resolve_addr) = 0x65e39e38 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_resolve_route) = 0xa3b7af34 for package: 
kmod-lustre-2.12.5-1.el7.x86_64
--> Processing Dependency: ksym(rdma_set_reuseaddr) = 0x11e1ebcc for package: 
kmod-lustre-2.12.5-1.el7.x86_64
---> Package kmod-lustre-osd-ldiskfs.x86_64 0:2.12.5-1.el7 will be installed
---> Package lustre.x86_64 0:2.12.5-1.el7 will be installed
---> Package lustre-osd-ldiskfs-mount.x86_64 0:2.12.5-1.el7 will be installed
--> Finished Dependency Resolution
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 
(/kmod-lustre-2.12.5-1.el7.x86_64)
           Requires: ksym(rdma_set_reuseaddr) = 0x11e1ebcc
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 
(/kmod-lustre-2.12.5-1.el7.x86_64)
           Requires: ksym(ib_fmr_pool_map_phys) = 0xdcf9c30f
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 (/kmod-lustre-2.12.5-1.el7.x86_64)
           Requires: ksym(ib_dealloc_pd_user) = 0x534a2aa9
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 
(/kmod-lustre-2.12.5-1.el7.x86_64)
           Requires: ksym(backport_dependency_symbol) = 0xb43a926b
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 
(/kmod-lustre-2.12.5-1.el7.x86_64)
           Requires: ksym(ib_modify_qp) = 0x31adefba
Error: Package: kmod-lustre-2.12.5-1.el7.x86_64 
(/kmod-lustre-2.12.5-1.el7.x86_64)
           Requires: ksym(rdma_resolve_route) = 0xa3b7af34

... snip ...


I know this issue has come up a few times on this list in the past:

- Most recently:
(http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-October/016738.html)
- and I raised a similar issue two years ago the last time I was building with MOFED:
(http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2018-August/015796.html)

I'm closely following the procedure in the wiki: http://wiki.lustre.org/Compiling_Lustre

- Build patched kernel rpms, install these, particularly *-devel, removing all other kernels from build machine

- Build custom MOFED against patched kernel:
./mlnx_add_kernel_support.sh --make-tgz --verbose --yes --kernel 3.10.0-1127.8.2.el7_lustre.x86_64 --kernel-sources /usr/src/kernels/3.10.0-1127.8.2.el7_lustre.x86_64 --tmpdir /tmp --distro rhel7.8 --mlnx_ofed /root/MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64 --kmp

- Install custom MOFED modules
Either: yum localinstall {mlnx-ofa_kernel-[0-9].*,mlnx-ofa_kernel-devel-[0-9].*,mlnx-ofa_kernel-modules-[0-9].*}.x86_64.rpm
or
yum localinstall mlnx-ofed-all...

- Build Lustre against the MOFED IB stack:
./configure --enable-server --with-linux=/usr/src/kernels/3.10.0-1127.8.2.el7_lustre.x86_64 --with-o2ib=/usr/src/ofa_kernel/default

However in every case I've tried, this results in a kmod-lustre package that has dependencies that the rebuilt MOFED modules packages do not provide:

[user@machine lustre-release]# rpm -q --requires -p kmod-lustre-2.12.5-1.el7.x86_64.rpm | grep ksym
ksym(__ib_alloc_pd) = 0x9cbf7973
ksym(__ib_create_cq) = 0x89e52306
ksym(__rdma_accept) = 0x8de99f59
ksym(__rdma_create_id) = 0xb4dc7b7e
ksym(backport_dependency_symbol) = 0xb43a926b
ksym(ib_alloc_mr_user) = 0x1fb7fcc9
ksym(ib_create_fmr_pool) = 0x1f5667d3
ksym(ib_dealloc_pd_user) = 0x534a2aa9
ksym(ib_dereg_mr_user) = 0x02332dc6
ksym(ib_destroy_cq_user) = 0x6391feb0
ksym(ib_fmr_pool_map_phys) = 0xdcf9c30f
ksym(ib_fmr_pool_unmap) = 0xd0481e41
ksym(ib_get_dma_mr) = 0x366559bd
ksym(ib_map_mr_sg) = 0x0366904f
ksym(ib_modify_qp) = 0x31adefba
ksym(rdma_bind_addr) = 0x445a242e
ksym(rdma_connect) = 0xaed8f42f
ksym(rdma_create_qp) = 0x247ddac2
ksym(rdma_destroy_id) = 0x7ea42958
ksym(rdma_destroy_qp) = 0xfa90a30a
ksym(rdma_disconnect) = 0x72109dd0
ksym(rdma_listen) = 0xff8db636
ksym(rdma_notify) = 0x7d20777a
ksym(rdma_reject) = 0x28d81cc0
ksym(rdma_resolve_addr) = 0x65e39e38
ksym(rdma_resolve_route) = 0xa3b7af34
ksym(rdma_set_reuseaddr) = 0x11e1ebcc

[user@machine MLNX_LIBS]# rpm -q --provides -p mlnx-ofa_kernel*.rpm | grep ksym

As Stefane mentioned in the October thread (http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2019-October/016749.html) the only package in MOFED that appears to provide these symbols is the kmod-mlnx-ofa_kernel package that is *only* built when KMP is supported, which I've found is only the case when building against an *unpatched* distribution kernel.

eg:

# Searching for kmod-mlnx-ofa_kernel in the downloaded MOFED from MLNX
[user@machine ~]# find MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64/RPMS/COMMON -name kmod-mlnx-ofa_kernel*
MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64/RPMS/COMMON/kmod-mlnx-ofa_kernel-4.9-OFED.4.9.0.1.7.1.gd3d963b.rhel7u8.x86_64.rpm

# Not present in the MOFED rebuilt against the patched lustre kernel
[user@machine ~]# find MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64-ext/RPMS/COMMON -name kmod-mlnx-ofa_kernel*
[user@machine ~]#

Digging into the MOFED install script 'install.pl', inside MLNX_OFED_LINUX-4.9-0.1.7.0-rhel7.8-x86_64/src/MLNX_OFED_SRC-4.9-0.1.7.0.tgz
I can see why this is:

1401 if ($kmp and ($DISTRO =~ 
m/XenServer|RHEL5.2|FC|WINDRIVER6|POWERKVM|BLUENIX1/ or $kernel =~ 
/xs|fbk|fc|debug|lustre/)) {
1402     print_and_log_colored("KMP is not supported on $DISTRO. Switching to non-KMP 
mode", $verbose2, "RED");
1403     $kmp = 0;
1404 }

So essentially if the kernel version contains 'lustre' in it, then KMP support is disabled, and it will not build the kmod packages.

By removing that check and rebuilding, I indeed get a set of kmod-mlnx-* RPMS produced that provide the necessary symbols, eg:

[user@machine COMMON]# rpm -q --provides -p kmod-mlnx-ofa_kernel-4.9-OFED.4.9.0.1.7.1.gd3d963b.202008230901.rhel7u8.x86_64.rpm | grep 'ksym(rdma_connect'
ksym(rdma_connect) = 0xaed8f42f

and with this installed, I can install the Lustre packages correctly finally.

However this leaves me with a number of questions:

* Is this check MLNX have added actually incorrect? It has been present since MOFED 4.2, and maybe we shouldn't be building with KMP support since this isn't a distro kernel?

* Building MOFED *without* KMP support produces the mlnx-ofa_kernel-modules package instead which contains the kernel modules. Should *this* package not provide the 'ksym' symbols that the lustre package is picking up?

* Or is there something wrong with the Lustre build scripts, picking up these ksym dependencies when it shouldn't?

* Or am I doing something completely wrong and everyone else is building Lustre servers + MOFED happily and I just need to fix my build process to match?

Apologies for such a long-winded email, but this has been driving me slightly mad the past couple of days and I'd like to get to the bottom of what's going on.

If anyone has had success building this combination (which I'm sure plenty have!) please can you let me know if you've encountered this issue, or if not what you are doing differently?

Kind regards,
Matt

--
Matt Rásó-Barnett
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to