Hi all, First post to the list, I'm Kumar Nachiketa, a storage practitioner running a small personal lab on NVIDIA DGX Spark hardware (Grace UMA workstation, ARM64, Ubuntu 24.04 LTS, kernel 6.17.0-1014-nvidia). This is a personal-capacity project on my own hardware, not affiliated with my employer.
I tried jira.whamcloud.com self-signup but the registration page redirects to a contact-administrators form that isn't configured, so I'm filing on the list. Happy to refile in Jira if a maintainer can help me get an account, or if a triager prefers to relay it there. This is the first of three bugs I'll send separately. Summary ------- Lustre master ./configure --with-zfs=<zfs-source-root> writes KBUILD_EXTRA_SYMBOLS pointing at <zfs-source-root>/Module.symvers, but OpenZFS 2.4+ relocated this file to <zfs-source-root>/module/Module.symvers as part of its build-tree reorganization. The osd-zfs build then proceeds with no Module.symvers reference; kbuild emits WARNING: Symbol version dump "<path>/Module.symvers" is missing and the resulting osd_zfs.ko has unresolved symbols and fails to load (Unknown symbol against zfs/spl symbols at insmod time). Environment ----------- Kernel: 6.17.0-1014-nvidia (NVIDIA-signed Ubuntu kernel) OS: Ubuntu 24.04.1 LTS ARM64 OpenZFS: 2.4.1 (release tag) Lustre: master @ 805cece6747f442449f32a1d25a8b8a03b230875 Hardware: NVIDIA DGX Spark (Grace UMA workstation, ARM64) Reproduction ------------ Step 1 — Build OpenZFS 2.4: git clone https://github.com/openzfs/zfs.git && cd zfs git checkout zfs-2.4.1 ./autogen.sh ./configure --with-linux=/lib/modules/$(uname -r)/build \ --with-linux-obj=/lib/modules/$(uname -r)/build make -j$(nproc) At this point Module.symvers is at module/Module.symvers, NOT at the source root: ls Module.symvers # -> ENOENT ls module/Module.symvers # -> present Step 2 — Build Lustre against it: cd ../ git clone https://git.whamcloud.com/fs/lustre-release.git cd lustre-release git checkout 805cece6747f442449f32a1d25a8b8a03b230875 sh autogen.sh ./configure --with-linux=/lib/modules/$(uname -r)/build \ --with-zfs=/path/to/zfs \ --disable-ldiskfs \ --with-o2ib=/lib/modules/$(uname -r)/build make -j$(nproc) The osd-zfs build emits the Symbol-version-dump warning; the resulting osd_zfs.ko is unresolvable. Expected behavior ----------------- ./configure --with-zfs=<path> resolves the OpenZFS symbol-versions file regardless of whether the OpenZFS source tree uses the pre-2.4 layout (<path>/Module.symvers) or the 2.4+ layout (<path>/module/Module.symvers). Actual behavior --------------- KBUILD_EXTRA_SYMBOLS is written with the pre-2.4 path unconditionally. osd-zfs builds without symbol-version data; the kernel module produced is unresolvable. Workaround (measured working) ----------------------------- After the OpenZFS build completes and before Lustre ./configure, create a symlink at the legacy location: ln -sf module/Module.symvers <zfs-source-root>/Module.symvers Lustre ./configure then resolves Module.symvers correctly, osd-zfs builds clean, and osd_zfs.ko loads. Validated end-to-end (Lustre filesystem mounted, IO measured) in the linked reproduce kit. Suggested fix ------------- ./configure should probe both candidate paths and use whichever exists: if test -f "$with_zfs/module/Module.symvers"; then ZFS_SYMBOLS_PATH="$with_zfs/module/Module.symvers" elif test -f "$with_zfs/Module.symvers"; then ZFS_SYMBOLS_PATH="$with_zfs/Module.symvers" else AC_MSG_ERROR([cannot locate ZFS Module.symvers under $with_zfs]) fi (Or equivalently in the osd-zfs Kbuild that consumes the variable.) Reference --------- Public reproduce kit (build cascade documented end-to-end): https://github.com/knachiketa04/aihomelab/tree/main/artifacts/training/lustre-on-uma-workstations/reproduce Thanks, Kumar [email protected] _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
