Hi all,

First post to the list, I'm Kumar Nachiketa, a storage practitioner
running a small personal lab on NVIDIA DGX Spark hardware (Grace UMA
workstation, ARM64, Ubuntu 24.04 LTS, kernel 6.17.0-1014-nvidia). This
is a personal-capacity project on my own hardware, not affiliated with
my employer.

I tried jira.whamcloud.com self-signup but the registration page
redirects to a contact-administrators form that isn't configured, so
I'm filing on the list. Happy to refile in Jira if a maintainer can
help me get an account, or if a triager prefers to relay it there.

This is the first of three bugs I'll send separately.


Summary
-------

Lustre master ./configure --with-zfs=<zfs-source-root> writes
KBUILD_EXTRA_SYMBOLS pointing at <zfs-source-root>/Module.symvers, but
OpenZFS 2.4+ relocated this file to <zfs-source-root>/module/Module.symvers
as part of its build-tree reorganization. The osd-zfs build then proceeds
with no Module.symvers reference; kbuild emits

WARNING: Symbol version dump "<path>/Module.symvers" is missing

and the resulting osd_zfs.ko has unresolved symbols and fails to load
(Unknown symbol against zfs/spl symbols at insmod time).


Environment
-----------

  Kernel:    6.17.0-1014-nvidia (NVIDIA-signed Ubuntu kernel)
  OS:        Ubuntu 24.04.1 LTS ARM64
  OpenZFS:   2.4.1 (release tag)
  Lustre:    master @ 805cece6747f442449f32a1d25a8b8a03b230875
  Hardware:  NVIDIA DGX Spark (Grace UMA workstation, ARM64)


Reproduction
------------

Step 1 — Build OpenZFS 2.4:

  git clone https://github.com/openzfs/zfs.git && cd zfs
  git checkout zfs-2.4.1
  ./autogen.sh
  ./configure --with-linux=/lib/modules/$(uname -r)/build \
              --with-linux-obj=/lib/modules/$(uname -r)/build
  make -j$(nproc)

At this point Module.symvers is at module/Module.symvers, NOT at the
source root:

  ls Module.symvers          # -> ENOENT
  ls module/Module.symvers   # -> present

Step 2 — Build Lustre against it:

  cd ../
  git clone https://git.whamcloud.com/fs/lustre-release.git
  cd lustre-release
  git checkout 805cece6747f442449f32a1d25a8b8a03b230875
  sh autogen.sh
  ./configure --with-linux=/lib/modules/$(uname -r)/build \
              --with-zfs=/path/to/zfs \
              --disable-ldiskfs \
              --with-o2ib=/lib/modules/$(uname -r)/build
  make -j$(nproc)

The osd-zfs build emits the Symbol-version-dump warning; the resulting
osd_zfs.ko is unresolvable.


Expected behavior
-----------------

./configure --with-zfs=<path> resolves the OpenZFS symbol-versions file
regardless of whether the OpenZFS source tree uses the pre-2.4 layout
(<path>/Module.symvers) or the 2.4+ layout (<path>/module/Module.symvers).


Actual behavior
---------------

KBUILD_EXTRA_SYMBOLS is written with the pre-2.4 path unconditionally.
osd-zfs builds without symbol-version data; the kernel module produced
is unresolvable.


Workaround (measured working)
-----------------------------

After the OpenZFS build completes and before Lustre ./configure, create
a symlink at the legacy location:

  ln -sf module/Module.symvers <zfs-source-root>/Module.symvers

Lustre ./configure then resolves Module.symvers correctly, osd-zfs
builds clean, and osd_zfs.ko loads. Validated end-to-end (Lustre
filesystem mounted, IO measured) in the linked reproduce kit.


Suggested fix
-------------

./configure should probe both candidate paths and use whichever exists:

  if test -f "$with_zfs/module/Module.symvers"; then
      ZFS_SYMBOLS_PATH="$with_zfs/module/Module.symvers"
  elif test -f "$with_zfs/Module.symvers"; then
      ZFS_SYMBOLS_PATH="$with_zfs/Module.symvers"
  else
      AC_MSG_ERROR([cannot locate ZFS Module.symvers under $with_zfs])
  fi

(Or equivalently in the osd-zfs Kbuild that consumes the variable.)


Reference
---------

Public reproduce kit (build cascade documented end-to-end):

https://github.com/knachiketa04/aihomelab/tree/main/artifacts/training/lustre-on-uma-workstations/reproduce

Thanks,
Kumar
[email protected]
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to