Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK

Christian König Tue, 19 May 2026 02:02:02 -0700

On 5/19/26 10:59, Timur Kristóf wrote:
> On Tuesday, May 19, 2026 10:54:10 AM Central European Summer Time Christian 
> König wrote:
>> On 5/19/26 10:22, Timur Kristóf wrote:
>>> UVD 4.x and older require that BOs don't cross 256M segments.
>>> We need to respect that in amdgpu_ttm_alloc_gart().
>>> We can't move the BOs later because GTT->GTT moves are
>>> not implemented. We also can't force all BOs to VRAM
>>> because that becomes very problematic in low VRAM scenarios.
>>>
>>> This fixes UVD CS BOs crossing 256M segments
>>> when they are placed in the GART.
>>
>> Clear NAK for that approach.
>>
>> This is the general TTM interface function and shouldn't have any HW
>> generation dependent code in it.
> 
> I don't see how else to solve this, since GTT->GTT moves are not implemented,
> so we can't move the BO to a suitable address later. We also can't move it to 
> VRAM.


GTT to GTT moves should be relatively easy to implement.

We just need to wait for the BO to be idle, unbind, move and bind again.

Regards,
Christian.

> 
> Please suggest a better approach if you don't like this one.
> 
> 
>>
>> Regards,
>> Christian.
>>
>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4799
>>> Signed-off-by: Timur Kristóf <[email protected]>
>>> ---
>>>
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 56 ++++++++++++++++++++++---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  3 ++
>>>  2 files changed, 53 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index
>>> 6c6ab4dd6ea9..a106c7e77e26 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -959,6 +959,40 @@ static int amdgpu_ttm_backend_bind(struct ttm_device
>>> *bdev,> 
>>>     return 0;
>>>  
>>>  }
>>>
>>> +/**
>>> + * amdgpu_ttm_fill_gart_256M_placements() - Fill placements array with
>>> 256M GART segments + *
>>> + * @bo: TTM buffer objects whose placements should be filled
>>> + * @placements: Pointer to an array of placements
>>> + * @max_placements: Size of the placements array
>>> + *
>>> + * Fill the specified placements array with 256M GART segments,
>>> + * starting from the highest address in order to reduce the
>>> + * contention of the lowest segment.
>>> + *
>>> + * Returns the number of placements filled.
>>> + */
>>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
>>> +                                    struct ttm_place 
> *placements,
>>> +                                    u32 max_placements)
>>> +{
>>> +   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
>>> +   u32 i;
>>> +
>>> +   /* Fill the placements array with 256M segments, starting from 
> highest.
>>> */ +        for (i = 0; i < max_placements; ++i) {
>>> +           if (i * SZ_256M >= adev->gmc.gart_size)
>>> +                   break;
>>> +
>>> +           placements[i].lpfn = (adev->gmc.gart_size - i * 
> SZ_256M) >> PAGE_SHIFT;
>>> +           placements[i].fpfn = ALIGN_DOWN(placements[i].lpfn - 1, 
> SZ_256M >>
>>> PAGE_SHIFT); +              placements[i].mem_type = TTM_PL_TT;
>>> +           placements[i].flags = bo->resource->placement;
>>> +   }
>>> +
>>> +   return i;
>>> +}
>>> +
>>>
>>>  /*
>>>  
>>>   * amdgpu_ttm_alloc_gart - Make sure buffer object is accessible either
>>>   * through AGP or GART aperture.
>>>
>>> @@ -973,7 +1007,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object
>>> *bo)> 
>>>     struct ttm_operation_ctx ctx = { false, false };
>>>     struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(bo->ttm);
>>>     struct ttm_placement placement;
>>>
>>> -   struct ttm_place placements;
>>> +   struct ttm_place placements[AMDGPU_BO_MAX_PLACEMENTS];
>>>
>>>     struct ttm_resource *tmp;
>>>     uint64_t addr, flags;
>>>     int r;
>>>
>>> @@ -987,11 +1021,21 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object
>>> *bo)> 
>>>     /* allocate GART space */
>>>     placement.num_placement = 1;
>>>
>>> -   placement.placement = &placements;
>>> -   placements.fpfn = 0;
>>> -   placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
>>> -   placements.mem_type = TTM_PL_TT;
>>> -   placements.flags = bo->resource->placement;
>>> +   placement.placement = &placements[0];
>>> +   placements[0].fpfn = 0;
>>> +   placements[0].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
>>> +   placements[0].mem_type = TTM_PL_TT;
>>> +   placements[0].flags = bo->resource->placement;
>>> +
>>> +   /*
>>> +    * UVD 4.x and older require that BOs don't cross 256M segments.
>>> +    * We need to respect that here. We can't move the BO later
>>> +    * because GTT->GTT moves are not implemented.
>>> +    */
>>> +   if (bo->base.size < SZ_256M && adev->family <= AMDGPU_FAMILY_KV)
>>> +           placement.num_placement =
>>> +                   amdgpu_ttm_fill_gart_256M_placements(bo, 
> placements,
>>> +                                                        
> ARRAY_SIZE(placements));
>>>
>>>     r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
>>>     if (unlikely(r))
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index
>>> 2d72fa217274..e9de628c8d2d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
>>> @@ -202,6 +202,9 @@ int amdgpu_ttm_clear_buffer(struct
>>> amdgpu_ttm_buffer_entity *entity,> 
>>>                         u64 k_job_id);
>>>  
>>>  struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct
>>>  amdgpu_device *adev);> 
>>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
>>> +                                    struct ttm_place 
> *placements,
>>> +                                    u32 max_placements);
>>>
>>>  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
>>>  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
>>>  uint64_t amdgpu_ttm_domain_start(struct amdgpu_device *adev, uint32_t
>>>  type);
> 
> 
> 
>

Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK

Reply via email to