"Sv. Lockal" <[email protected]> writes: > 1) With upcoming firmware AMD NPU will be exposed as /dev/accel/accel0. > This device is owned by root:render, similarly to GPU. > When present, tools like rocminfo try to query device capabilities, > breaking sandbox. To fix this issue, this device has now addwrite in > check_amdgpu. > > 2) There are a bunch of bugs from tinderbox and users who forgot to > enable KFD in kernel. Instead of recommendation to check permissions, > they will see a better message that AMD device is missing. > > 3) In cases when we just want to addwrite to AMD devices, new function > rocm_add_sandbox (similar to cuda_add_sandbox) was added. No errors are > raised if device is missing. >
LGTM. At some point, we need to figure out some better scheme for this like some USE on acct-group/portage or whatever but it is what it is (https://bugs.gentoo.org/955859) for now. > Part-of: https://github.com/gentoo/gentoo/pull/44355 > Bug: https://bugs.gentoo.org/965198 > Signed-off-by: Sv. Lockal <[email protected]> > --- > eclass/rocm.eclass | 37 ++++++++++++++++++++++++++++++++++--- > 1 file changed, 34 insertions(+), 3 deletions(-) > > diff --git a/eclass/rocm.eclass b/eclass/rocm.eclass > index 0fa99a4178e71..c24666e33e8d6 100644 > --- a/eclass/rocm.eclass > +++ b/eclass/rocm.eclass > @@ -248,13 +248,44 @@ get_amdgpu_flags() { > echo $(printf "%s;" ${AMDGPU_TARGETS[@]}) > } > > +# @FUNCTION: rocm_add_sandbox > +# @USAGE: [-w] > +# @DESCRIPTION: > +# Add AMD GPU/NPU dev nodes to the sandbox predict list. > +# with -w, add to the sandbox write list. > +rocm_add_sandbox() { > + debug-print-function "${FUNCNAME[0]}" "$@" > + > + local i > + for i in /dev/kfd /dev/dri/render* /dev/accel/accel*; do > + if [[ ! -c $i ]]; then > + continue > + elif [[ $1 == '-w' ]]; then > + addwrite "$i" > + else > + addpredict "$i" > + fi > + done > +} > + > # @FUNCTION: check_amdgpu > # @USAGE: check_amdgpu > # @DESCRIPTION: > -# grant and check read-write permissions on AMDGPU devices, die if > not available. > +# Grant and check read-write permissions on AMDGPU and AMDNPU devices. > +# Die if no AMDGPU devices are available. > check_amdgpu() { > - for device in /dev/kfd /dev/dri/render*; do > - addwrite ${device} > + # Common case: no AMDGPU device or the kernel fusion driver is > disabled in the kernel. > + if [[ ! -c /dev/kfd ]]; then > + eerror "Device /dev/kfd does not exist!" > + eerror "To proceed, you need to have an AMD GPU and > have CONFIG_HSA_AMD set in your kernel config." > + die "/dev/kfd is missing" > + fi > + > + local device > + for device in /dev/kfd /dev/dri/render* /dev/accel/accel*; do > + [[ ! -c ${device} ]] && continue > + > + addwrite "${device}" > if [[ ! -r ${device} || ! -w ${device} ]]; then > eerror "Cannot read or write ${device}!" > eerror "Make sure it is present and check the > permission."
