Am 2021-08-02 um 12:28 p.m. schrieb Philip Yang:
> For xnack on, if range ACCESS or ACCESS_IN_PLACE (AIP) by single GPU, or
> range is ACCESS_IN_PLACE by mGPUs and all mGPUs connection on xgmi same
> hive, the best prefetch location is prefetch_loc GPU. Otherwise, the best
> prefetch location is always CPU because GPU can not map vram of other
> GPUs through small bar PCIe.
>
> Signed-off-by: Philip Yang <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 44 +++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 69237d2ab2ad..6160c301f78b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -2754,22 +2754,29 @@ svm_range_add(struct kfd_process *p, uint64_t start, 
> uint64_t size,
>       return 0;
>  }
>  
> -/* svm_range_best_prefetch_location - decide the best prefetch location
> +/**
> + * svm_range_best_prefetch_location - decide the best prefetch location
>   * @prange: svm range structure
>   *
>   * For xnack off:
> - * If range map to single GPU, the best acutal location is prefetch loc, 
> which
> + * If range map to single GPU, the best prefetch location is prefetch_loc, 
> which
>   * can be CPU or GPU.
>   *
> - * If range map to multiple GPUs, only if mGPU connection on xgmi same hive,
> - * the best actual location could be prefetch_loc GPU. If mGPU connection on
> - * PCIe, the best actual location is always CPU, because GPU cannot access 
> vram
> - * of other GPUs, assuming PCIe small bar (large bar support is not 
> upstream).
> + * If range is ACCESS or ACCESS_IN_PLACE by mGPUs, only if mGPU connection on
> + * xgmi same hive, the best prefetch location is prefetch_loc GPU. If mGPUs
> + * connection on PCIe, the best prefetch location is always CPU, because GPU
> + * cannot map vram of other GPUs, assuming PCIe small bar (large bar support 
> is
> + * not upstream).
>   *
>   * For xnack on:
> - * The best actual location is prefetch location. If mGPU connection on xgmi
> - * same hive, range map to multiple GPUs. Otherwise, the range only map to
> - * actual location GPU. Other GPU access vm fault will trigger migration.
> + * If range map to single GPU or is not ACCESS_IN_PLACE by mGPUs, the best
> + * prefetch location is prefetch_loc, other GPU access vm fault will trigger
> + * migration.
> + *
> + * If range is ACCESS_IN_PLACE by mGPUs, only if mGPU connection on xgmi same
> + * hive, the best prefetch location is prefetch_loc GPU. If mGPUs connection 
> on
> + * PCIe, the best prefetch location is always CPU, because GPU cannot map 
> vram
> + * of other GPUs.
>   *
>   * Context: Process context
>   *
> @@ -2787,14 +2794,13 @@ svm_range_best_prefetch_location(struct svm_range 
> *prange)
>       struct kfd_process *p;
>       uint32_t gpuidx;
>  
> -     p = container_of(prange->svms, struct kfd_process, svms);
> -
> -     /* xnack on */
> -     if (p->xnack_enabled)
> +     if (!best_loc || best_loc == KFD_IOCTL_SVM_LOCATION_UNDEFINED)
>               goto out;
>  
> -     /* xnack off */
> -     if (!best_loc || best_loc == KFD_IOCTL_SVM_LOCATION_UNDEFINED)
> +     p = container_of(prange->svms, struct kfd_process, svms);
> +
> +     if (p->xnack_enabled &&
> +        bitmap_weight(prange->bitmap_aip, MAX_GPU_INSTANCE) <= 1)

I'm not sure why this condition is needed. I think there is an implicit
assumption that the prefetch_location is the one GPU that's set in the
bitmap_aip. But is that really a safe assumption? If you're prefetching
to a different GPU than the one that needs to access the memory
in-place, then the access is still going to trigger a page fault, so
it's not "in-place".

I think you can just drop this condition. It's a questionable short-cut
for something that's handled more correctly in the loop below.

Regards,
  Felix


>               goto out;
>  
>       bo_adev = svm_range_get_adev_by_id(prange, best_loc);
> @@ -2803,8 +2809,12 @@ svm_range_best_prefetch_location(struct svm_range 
> *prange)
>               best_loc = 0;
>               goto out;
>       }
> -     bitmap_or(bitmap, prange->bitmap_access, prange->bitmap_aip,
> -               MAX_GPU_INSTANCE);
> +
> +     if (p->xnack_enabled)
> +             bitmap_copy(bitmap, prange->bitmap_aip, MAX_GPU_INSTANCE);
> +     else
> +             bitmap_or(bitmap, prange->bitmap_access, prange->bitmap_aip,
> +                       MAX_GPU_INSTANCE);
>  
>       for_each_set_bit(gpuidx, bitmap, MAX_GPU_INSTANCE) {
>               pdd = kfd_process_device_from_gpuidx(p, gpuidx);

Reply via email to