On 2024-11-20 22:58, Chen, Xiaogang wrote:
>>> @@ -1822,15 +1804,20 @@ struct kfd_process
>>> *kfd_lookup_process_by_pasid(u32 pasid)
>>> {
>>> struct kfd_process *p, *ret_p = NULL;
>>> unsigned int temp;
>>> + int i;
>>> int idx = srcu_read_lock(&kfd_processes_srcu);
>>> hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>>> - if (p->pasid == pasid) {
>>> - kref_get(&p->ref);
>>> - ret_p = p;
>>> - break;
>>> + for (i = 0; i < p->n_pdds; i++) {
>>> + if (p->pdds[i]->pasid == pasid) {
>>> + kref_get(&p->ref);
>>> + ret_p = p;
>>> + break;
>>> + }
>> I think this won't work correctly. The same PASID can be used for different
>> processes on different GPUs because each adev manages its own
>> PASID->amdgpu_vm lookup table. So kfd_lookup_process_by_pasid needs a new
>> parameter that identifies the GPU adev, and you should only compare pasids,
>> if the adev matches.
>
> I think it is the main concern here: the pasid used here is global in driver
> by amdgpu_pasid_alloc(16) at amdgpu_driver_open_kms. Each time a render
> node(partition) got opened, a new pasid value is generated. Its lifetime is
> until render node got closed. A pdd just uses this pasid. And each adev has
> its own xarray which saves pasids for this adev.
I think I had a misunderstanding here. I saw the xarray in adev and assumed
that each adev allocated PASIDs independently. But there is also a global
amdgpu_pasid_ida that I missed. If the PASID allocation is global in the
amdgpu_pasid_ida, then each PASID uniquely identifies a VM your code should be
fine.
@Christian, we discussed the number of PASIDs consumed on systems with many
GPUs and partitions. If the PASIDs are managed globally, then running out of
PASIDs is a concern. Do you think we should change this?
Regards,
Felix
>
> Regards
>
> Xiaogang