On 10/18/2025 7:58 AM, Felix Kuehling wrote:
> See my comments on patches 8, 17 and 18. The remaining patches look
> good and are
>
> Reviewed-by: Felix Kuehling <[email protected]>
>
> I think this patch series is nearly ready, pending actual user mode
> code that uses it. 

Thanks, do you think it is a must to merge user space thunk/kfdtest code first?

Thanks
Lingshan

>
> Regards,
>   Felix
>
>
> On 2025-10-17 04:42, Zhu Lingshan wrote:
>> Currently kfd manages kfd_process in a one context (kfd_process)
>> per program manner, thus each user space program
>> only onws one kfd context (kfd_process).
>>
>> This model works fine for most of the programs, but imperfect
>> for a hypervisor like QEMU. Because all programs in the guest
>> user space share the same only one kfd context, which is
>> problematic, including but not limited to:
>>
>> As illustrated in Figure 1, all guest user space programs share the
>> same fd of /dev/kfd
>> and the same kfd_process, and the same PASID leading to the same
>> GPU_VM address space. Therefore the IOVA range of each
>> guest user space programs are not isolated,
>> they can attack each other through GPU DMA.
>>
>>
>>  
>> +----------------------------------------------------------------------------------+
>>  
>> |                                                                            
>>      
>> |
>>   |  +-----------+      +-----------+      +------------+     
>> +------------+        |
>>   |  |           |      |           |      |            |     
>> |            |        |
>>   |  | Program 1 |      | Program 2 |      | Program 3  |      |
>> Program N  |        |
>>   |  |           |      |           |      |            |     
>> |            |        |
>>   |  +----+------+      +--------+--+      +--+---------+     
>> +-----+------+        |
>>   |       |                      |            |                     
>> |               |
>>   |       |                      |            |                     
>> |        Guest  |
>>   |       |                      |            |                     
>> |               |
>>  
>> +-------+----------------------+------------+----------------------+---------------+
>>           |                      |            |                      |
>>           |                      |            |                      |
>>           |                      |            |                      |
>>           |                      |            |                      |
>>           |                   +--+------------+---+                  |
>>           |                   | file descriptor   |                  |
>>           +-------------------+ of /dev/kfd       +------------------+
>>                               | opened by QEMU    |
>>                               |                   |
>>                               +---------+---------+                  
>> User Space
>>                                         |                            
>> QEMU
>>                                         |
>> ---------------------------------------+-----------------------------------------------------
>>
>>                                         |                            
>> Kernel Space
>>                                         |                            
>> KFD Module
>>                                         |
>>                                +--------+--------+
>>                                |                 |
>>                                |   kfd_process  
>> |<------------------The only one KFD context
>>                                |                 |
>>                                +--------+--------+
>>                                         |
>>                                +--------+--------+
>>                                |     PASID       |
>>                                +--------+--------+
>>                                         |
>>                                +--------+--------+
>>                                |      GPU_VM     |
>>                                +-----------------+
>>
>>                                   Fiture 1
>>
>>
>> This series implements a multiple contexts solution:
>> - Allow each program to create and hold multiple contexts (kfd
>> processes)
>> - Each context has its own fd of /dev/kfd and an exclusive kfd_process,
>>    which is a secondary kfd context. So that PASID/GPU VM isolates
>> their IOVA address spaces.
>>    Therefore, they can not attack each other through GPU DMA.
>>
>> The design is illustrated in Figure 2 below:
>>
>>    
>> +---------------------------------------------------------------------------------------------------------+
>>    
>> |                                                                            
>>                             
>> |
>>    
>> |                                                                            
>>                             
>> |
>>    
>> |                                                                            
>>                             
>> |
>>     |      
>> +----------------------------------------------------------------------------------+
>>              
>> |
>>     |      
>> |                                                                            
>>      
>> |              |
>>     |       | +-----------+      +-----------+     +-----------+   
>> +-----------+              |              |
>>     |       | |           |      |           |     |           |   
>> |           |              |              |
>>     |       | | Program 1 |      | Program 2 |     | Program 3 |    |
>> Program N |              |              |
>>     |       | |           |      |           |     |           |   
>> |           |              |              |
>>     |       | +-----+-----+      +-----+-----+     +-----+-----+   
>> +-----+-----+              |              |
>>     |       |       |                  |                
>> |                |                    |              |
>>     |       |       |                  |                
>> |                |        Guest       |              |
>>     |       |       |                  |                
>> |                |                    |              |
>>     |      
>> +-------+------------------+-----------------+----------------+--------------------+
>>              
>> |
>>     |               |                  |                
>> |                |                            QEMU   |
>>     |               |                  |                
>> |                |                                   |
>>    
>> +---------------+------------------+-----------------+----------------+--------------------------+--------+
>>                     |                  |                
>> |                |                          |
>>                     |                  |                
>> |                |                          |
>>                     |                  |                
>> |                |                          |
>>                 +---+----+         +---+----+        +---+----+      
>> +---+----+                 +---+-----+
>>                 |        |         |        |        |        |      
>> |        |                 | Primary |
>>                 |  FD 1  |         |  FD 2  |        |  FD 3  |      
>> |  FD 4  |                 |   FD    |
>>                 |        |         |        |        |        |      
>> |        |                 |         |
>>                 +---+----+         +---+----+        +---+----+      
>> +----+---+                 +----+----+
>>                     |                  |                
>> |                 |                          |             User Space
>>                     |                  |                
>> |                 |                          |
>> -------------------+------------------+-----------------+-----------------+--------------------------+----------------------------
>>
>>                     |                  |                
>> |                 |                          |             Kernel SPace
>>                     |                  |                
>> |                 |                          |
>>                     |                  |                
>> |                 |                          |
>>    
>> +--------------------------------------------------------------------------------------------------------------------------+
>>     |        +------+------+    +------+------+   +------+------+  
>> +------+------+            +------+------+                 |
>>     |        | Secondary   |    | Secondary   |   | Secondary   |   |
>> Secondary   |            |  Primary    |   KFD Module    |
>>     |        |kfd_process 1|    |kfd_process 2|   |kfd_process 3|  
>> |kfd_process 4|            | kfd_process |                 |
>>     |        |             |    |             |   |             |  
>> |             |            |             |                 |
>>     |        +------+------+    +------+------+   +------+------+  
>> +------+------+            +------+------+                 |
>>     |               |                  |                
>> |                 |                          |                        |
>>     |        +------+------+    +------+------+   +------+------+  
>> +------+------+            +------+------+                 |
>>     |        |   PASID     |    |   PASID     |   |   PASID     |  
>> |   PASID     |            |   PASID     |                 |
>>     |        +------+------+    +------+------+   +------+------+  
>> +------+------+            +------+------+                 |
>>     |               |                  |                
>> |                 |                          |                        |
>>     |               |                  |                
>> |                 |                          |                        |
>>     |        +------+------+    +------+------+   +------+------+  
>> +------+------+            +------+------+                 |
>>     |        |   GPU_VM    |    |   GPU_VM    |   |   GPU_VM    |  
>> |   GPU_VM    |            |   GPU_VM    |                 |
>>     |        +-------------+    +-------------+   +-------------+  
>> +-------------+            +-------------+                 |
>>    
>> |                                                                            
>>                                              
>> |
>>    
>> +--------------------------------------------------------------------------------------------------------------------------+
>>                                                                              
>>                                                      
>>                                                    Figure 2
>>
>> The relevant reference user space rocm changes could be found at:
>> https://github.com/AMD-ROCm-Internal/rocm-systems/pull/78
>> https://github.com/AMD-ROCm-Internal/rocm-systems/pull/110
>>
>> Changes from V4:
>> 1) rename process_id to context_id in struct kfd_process
>> 2) remove primary flag in struct kfd_process
>> 3) reject set_debug_trap ioctl request when
>> the target kfd_process is non-primary
>> 4) other than default 0, assign context_id 0xFFFF to the primary kfd
>> process
>>
>> Most of the patches have been changed, so I removed
>> their signed-off-by tag.
>>
>> Please help review
>>
>> Thanks!
>>
>> Zhu Lingshan (18):
>>    amdkfd: enlarge the hashtable of kfd_process
>>    amdkfd: mark the first kfd_process as the primary one
>>    amdkfd: find_process_by_mm always return the primary context
>>    amdkfd: Introduce kfd_create_process_sysfs as a separate function
>>    amdkfd: destroy kfd secondary contexts through fd close
>>    amdkfd: process svm ioctl only on the primary kfd process
>>    amdkfd: process USERPTR allocation only on the primary kfd process
>>    amdkfd: identify a secondary kfd process by its id
>>    amdkfd: find kfd_process by filep->private_data in kfd_mmap
>>    amdkfd: remove DIQ support
>>    amdkfd: process pointer of a HIQ should be NULL
>>    amdkfd: remove test_kq
>>    amdkfd: introduce new helper kfd_lookup_process_by_id
>>    amdkfd: record kfd context id into kfd process_info
>>    amdkfd: record kfd context id in amdkfd_fence
>>    amdkfd: fence handler evict and restore a kfd process by its context
>>      id
>>    amdkfd: process debug trap ioctl only on a primary context
>>    amdkfd: introduce new ioctl AMDKFD_IOC_CREATE_PROCESS
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   8 +-
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c  |  10 +-
>>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   8 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  79 +++++-
>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c       |   7 +-
>>   .../drm/amd/amdkfd/kfd_device_queue_manager.c |   6 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c |  61 +----
>>   .../drm/amd/amdkfd/kfd_packet_manager_v9.c    |   4 -
>>   .../drm/amd/amdkfd/kfd_packet_manager_vi.c    |   4 -
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  19 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 239 +++++++++++++-----
>>   .../amd/amdkfd/kfd_process_queue_manager.c    |  39 +--
>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |   2 +-
>>   include/uapi/linux/kfd_ioctl.h                |   8 +-
>>   14 files changed, 294 insertions(+), 200 deletions(-)
>>

Reply via email to