Currently kfd manages kfd_process in a one context (kfd_process) per program manner, thus each user space program only onws one kfd context (kfd_process).
This model works fine for most of the programs, but imperfect for a hypervisor like QEMU. Because all programs in the guest user space share the same only one kfd context, which is problematic, including but not limited to: As illustrated in Figure 1, all guest user space programs share the same fd of /dev/kfd and the same kfd_process, and the same PASID leading to the same GPU_VM address space. Therefore the IOVA range of each guest user space programs are not isolated, they can attack each other through GPU DMA. +----------------------------------------------------------------------------------+ | | | +-----------+ +-----------+ +------------+ +------------+ | | | | | | | | | | | | | Program 1 | | Program 2 | | Program 3 | | Program N | | | | | | | | | | | | | +----+------+ +--------+--+ +--+---------+ +-----+------+ | | | | | | | | | | | | Guest | | | | | | | +-------+----------------------+------------+----------------------+---------------+ | | | | | | | | | | | | | | | | | +--+------------+---+ | | | file descriptor | | +-------------------+ of /dev/kfd +------------------+ | opened by QEMU | | | +---------+---------+ User Space | QEMU | ---------------------------------------+----------------------------------------------------- | Kernel Space | KFD Module | +--------+--------+ | | | kfd_process |<------------------The only one KFD context | | +--------+--------+ | +--------+--------+ | PASID | +--------+--------+ | +--------+--------+ | GPU_VM | +-----------------+ Fiture 1 This series implements a multiple contexts solution: - Allow each program to create and hold multiple contexts (kfd processes) - Each context has its own fd of /dev/kfd and an exclusive kfd_process, which is a secondary kfd context. So that PASID/GPU VM isolates their IOVA address spaces. Therefore, they can not attack each other through GPU DMA. The design is illustrated in Figure 2 below: +---------------------------------------------------------------------------------------------------------+ | | | | | | | +----------------------------------------------------------------------------------+ | | | | | | | +-----------+ +-----------+ +-----------+ +-----------+ | | | | | | | | | | | | | | | | | Program 1 | | Program 2 | | Program 3 | | Program N | | | | | | | | | | | | | | | | | +-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+ | | | | | | | | | | | | | | | | Guest | | | | | | | | | | | +-------+------------------+-----------------+----------------+--------------------+ | | | | | | QEMU | | | | | | | +---------------+------------------+-----------------+----------------+--------------------------+--------+ | | | | | | | | | | | | | | | +---+----+ +---+----+ +---+----+ +---+----+ +---+-----+ | | | | | | | | | Primary | | FD 1 | | FD 2 | | FD 3 | | FD 4 | | FD | | | | | | | | | | | +---+----+ +---+----+ +---+----+ +----+---+ +----+----+ | | | | | User Space | | | | | -------------------+------------------+-----------------+-----------------+--------------------------+---------------------------- | | | | | Kernel SPace | | | | | | | | | | +--------------------------------------------------------------------------------------------------------------------------+ | +------+------+ +------+------+ +------+------+ +------+------+ +------+------+ | | | Secondary | | Secondary | | Secondary | | Secondary | | Primary | KFD Module | | |kfd_process 1| |kfd_process 2| |kfd_process 3| |kfd_process 4| | kfd_process | | | | | | | | | | | | | | | +------+------+ +------+------+ +------+------+ +------+------+ +------+------+ | | | | | | | | | +------+------+ +------+------+ +------+------+ +------+------+ +------+------+ | | | PASID | | PASID | | PASID | | PASID | | PASID | | | +------+------+ +------+------+ +------+------+ +------+------+ +------+------+ | | | | | | | | | | | | | | | | +------+------+ +------+------+ +------+------+ +------+------+ +------+------+ | | | GPU_VM | | GPU_VM | | GPU_VM | | GPU_VM | | GPU_VM | | | +-------------+ +-------------+ +-------------+ +-------------+ +-------------+ | | | +--------------------------------------------------------------------------------------------------------------------------+ Figure 2 The relevant reference user space rocm changes could be found at: https://github.com/AMD-ROCm-Internal/rocm-systems/pull/78 https://github.com/AMD-ROCm-Internal/rocm-systems/pull/110 Thanks! Zhu Lingshan (18): amdkfd: enlarge the hashtable of kfd_process amdkfd: mark the first kfd_process as the primary one amdkfd: find_process_by_mm always return the primary context amdkfd: Introduce kfd_create_process_sysfs as a separate function amdkfd: destroy kfd secondary contexts through fd close amdkfd: process svm ioctl only on the primary kfd process amdkfd: process USERPTR allocation only on the primary kfd process amdkfd: identify a secondary kfd process by its id amdkfd: find kfd_process by filep->private_data in kfd_mmap amdkfd: remove DIQ support amdkfd: process pointer of a HIQ should be NULL amdkfd: remove test_kq amdkfd: introduce new helper kfd_lookup_process_by_id amdkfd: record kfd process id into kfd process_info amdkfd: record kfd process id in amdkfd_fence amdkfd: fence handler evict and restore a kfd process by its id amdkfd: set_debug_trap ioctl only works on a primary kfd_process target amdkfd: introduce new ioctl AMDKFD_IOC_CREATE_PROCESS drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 8 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 10 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 8 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 102 +++++++- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 6 +- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 61 +---- .../drm/amd/amdkfd/kfd_packet_manager_v9.c | 4 - .../drm/amd/amdkfd/kfd_packet_manager_vi.c | 4 - drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 17 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 244 +++++++++++++----- .../amd/amdkfd/kfd_process_queue_manager.c | 39 +-- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +- include/uapi/linux/kfd_ioctl.h | 8 +- 14 files changed, 321 insertions(+), 199 deletions(-) -- 2.51.0