On 11/19/25 10:34, Pierre-Eric Pelloux-Prayer wrote: > > > Le 17/11/2025 à 10:46, Christian König a écrit : >> On 11/13/25 17:05, Pierre-Eric Pelloux-Prayer wrote: >>> This will allow the use of all of them for clear/fill buffer >>> operations. >>> Since drm_sched_entity_init requires a scheduler array, we >>> store schedulers rather than rings. For the few places that need >>> access to a ring, we can get it from the sched using container_of. >>> >>> Since the code is the same for all sdma versions, add a new >>> helper amdgpu_sdma_set_buffer_funcs_scheds to set buffer_funcs_scheds >>> based on the number of sdma instances. >>> >>> Note: the new sched array is identical to the amdgpu_vm_manager one. >>> These 2 could be merged. >> >> I realized a bit after we discussed it that this isn't true. >> >> We need the two arrays separated for a Navi 1x workaround to work correctly. > > Why 2 arrays? AFAICT the only needed thing is for amdgpu_ttm to be aware that > it should only use a single sched in this situation.
So it could just use the first entry of the array for TTM and the full array for VM updates. Good point, I haven't thought about that possibility. > >> >> Anyway, that doesn't affect reviewing this patch here. >> >>> >>> Signed-off-by: Pierre-Eric Pelloux-Prayer >>> <[email protected]> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 4 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 4 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 41 +++++++++++++++---- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 3 +- >>> drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 6 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 6 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 6 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 6 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/si_dma.c | 3 +- >>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +- >>> 17 files changed, 62 insertions(+), 45 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> index 3fab3dc9f3e4..05c13fb0e6bf 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> @@ -1615,6 +1615,8 @@ ssize_t amdgpu_get_soft_full_reset_mask(struct >>> amdgpu_ring *ring); >>> ssize_t amdgpu_show_reset_mask(char *buf, uint32_t supported_reset); >>> void amdgpu_sdma_set_vm_pte_scheds(struct amdgpu_device *adev, >>> const struct amdgpu_vm_pte_funcs *vm_pte_funcs); >>> +void amdgpu_sdma_set_buffer_funcs_scheds(struct amdgpu_device *adev, >>> + const struct amdgpu_buffer_funcs *buffer_funcs); >>> /* atpx handler */ >>> #if defined(CONFIG_VGA_SWITCHEROO) >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c >>> index b59040a8771f..9ea927e07a77 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c >>> @@ -32,12 +32,14 @@ static int amdgpu_benchmark_do_move(struct >>> amdgpu_device *adev, unsigned size, >>> uint64_t saddr, uint64_t daddr, int n, s64 *time_ms) >>> { >>> ktime_t stime, etime; >>> + struct amdgpu_ring *ring; >>> struct dma_fence *fence; >>> int i, r; >>> + ring = container_of(adev->mman.buffer_funcs_scheds[0], struct >>> amdgpu_ring, sched); >> >> We have the to_amdgpu_ring() macro for that. > > I'll update the patch, thx. > >> >>> + >>> stime = ktime_get(); >>> for (i = 0; i < n; i++) { >>> - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; >>> r = amdgpu_copy_buffer(ring, &adev->mman.default_entity.base, >>> saddr, daddr, size, NULL, &fence, >>> false, 0); >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index b92234d63562..1927d940fbca 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -3303,7 +3303,7 @@ static int amdgpu_device_ip_init(struct amdgpu_device >>> *adev) >>> if (r) >>> goto init_failed; >>> - if (adev->mman.buffer_funcs_ring->sched.ready) >>> + if (adev->mman.buffer_funcs_scheds[0]->ready) >>> amdgpu_ttm_set_buffer_funcs_status(adev, true); >>> /* Don't init kfd if whole hive need to be reset during init */ >>> @@ -4143,7 +4143,7 @@ static int amdgpu_device_ip_resume(struct >>> amdgpu_device *adev) >>> r = amdgpu_device_ip_resume_phase2(adev); >>> - if (adev->mman.buffer_funcs_ring->sched.ready) >>> + if (adev->mman.buffer_funcs_scheds[0]->ready) >> >> We should probably drop that check here and move this into >> amdgpu_ttm_set_buffer_funcs_status(). > > What should amdgpu_ttm_set_buffer_funcs_status() do if ready is false but > enable is true? Exit early? Yes, probably while printing a warning like "Not enabling DMA transfers for in kernel use..." or something like that. >> >>> amdgpu_ttm_set_buffer_funcs_status(adev, true); >>> if (r) >>> @@ -4493,7 +4493,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, >>> adev->num_rings = 0; >>> RCU_INIT_POINTER(adev->gang_submit, dma_fence_get_stub()); >>> adev->mman.buffer_funcs = NULL; >>> - adev->mman.buffer_funcs_ring = NULL; >>> + adev->mman.num_buffer_funcs_scheds = 0; >>> adev->vm_manager.vm_pte_funcs = NULL; >>> adev->vm_manager.vm_pte_num_scheds = 0; >>> adev->gmc.gmc_funcs = NULL; >>> @@ -5965,7 +5965,7 @@ int amdgpu_device_reinit_after_reset(struct >>> amdgpu_reset_context *reset_context) >>> if (r) >>> goto out; >>> - if (tmp_adev->mman.buffer_funcs_ring->sched.ready) >>> + if (tmp_adev->mman.buffer_funcs_scheds[0]->ready) >>> amdgpu_ttm_set_buffer_funcs_status(tmp_adev, true); >>> r = amdgpu_device_ip_resume_phase3(tmp_adev); >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >>> index 2713dd51ab9a..4433d8620129 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c >>> @@ -651,12 +651,14 @@ int amdgpu_gmc_allocate_vm_inv_eng(struct >>> amdgpu_device *adev) >>> void amdgpu_gmc_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid, >>> uint32_t vmhub, uint32_t flush_type) >>> { >>> - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; >>> + struct amdgpu_ring *ring; >>> struct amdgpu_vmhub *hub = &adev->vmhub[vmhub]; >>> struct dma_fence *fence; >>> struct amdgpu_job *job; >>> int r, i; >>> + ring = container_of(adev->mman.buffer_funcs_scheds[0], struct >>> amdgpu_ring, sched); >>> + >>> if (!hub->sdma_invalidation_workaround || vmid || >>> !adev->mman.buffer_funcs_enabled || !adev->ib_pool_ready || >>> !ring->sched.ready) { >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> index 6c333dba7a35..11fec0fa4c11 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>> @@ -308,7 +308,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct >>> amdgpu_device *adev, >>> struct dma_resv *resv, >>> struct dma_fence **f) >>> { >>> - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; >>> + struct amdgpu_ring *ring; >>> struct amdgpu_res_cursor src_mm, dst_mm; >>> struct dma_fence *fence = NULL; >>> int r = 0; >>> @@ -321,6 +321,8 @@ static int amdgpu_ttm_copy_mem_to_mem(struct >>> amdgpu_device *adev, >>> return -EINVAL; >>> } >>> + ring = container_of(adev->mman.buffer_funcs_scheds[0], struct >>> amdgpu_ring, sched); >>> + >>> amdgpu_res_first(src->mem, src->offset, size, &src_mm); >>> amdgpu_res_first(dst->mem, dst->offset, size, &dst_mm); >>> @@ -1493,6 +1495,7 @@ static int amdgpu_ttm_access_memory_sdma(struct >>> ttm_buffer_object *bo, >>> struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo); >>> struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev); >>> struct amdgpu_res_cursor src_mm; >>> + struct amdgpu_ring *ring; >>> struct amdgpu_job *job; >>> struct dma_fence *fence; >>> uint64_t src_addr, dst_addr; >>> @@ -1530,7 +1533,8 @@ static int amdgpu_ttm_access_memory_sdma(struct >>> ttm_buffer_object *bo, >>> amdgpu_emit_copy_buffer(adev, &job->ibs[0], src_addr, dst_addr, >>> PAGE_SIZE, 0); >>> - amdgpu_ring_pad_ib(adev->mman.buffer_funcs_ring, &job->ibs[0]); >>> + ring = container_of(adev->mman.buffer_funcs_scheds[0], struct >>> amdgpu_ring, sched); >>> + amdgpu_ring_pad_ib(ring, &job->ibs[0]); >>> WARN_ON(job->ibs[0].length_dw > num_dw); >>> fence = amdgpu_job_submit(job); >>> @@ -2196,11 +2200,9 @@ u32 amdgpu_ttm_set_buffer_funcs_status(struct >>> amdgpu_device *adev, bool enable) >>> return windows; >>> if (enable) { >>> - struct amdgpu_ring *ring; >>> struct drm_gpu_scheduler *sched; >>> - ring = adev->mman.buffer_funcs_ring; >>> - sched = &ring->sched; >>> + sched = adev->mman.buffer_funcs_scheds[0]; >>> r = drm_sched_entity_init(&adev->mman.default_entity.base, >>> DRM_SCHED_PRIORITY_KERNEL, &sched, >>> 1, NULL); >>> @@ -2432,7 +2434,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, >>> struct dma_fence **fence) >>> { >>> struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); >>> - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; >>> + struct amdgpu_ring *ring; >>> struct amdgpu_ttm_buffer_entity *entity; >>> struct amdgpu_res_cursor cursor; >>> u64 addr; >>> @@ -2443,6 +2445,8 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, >>> if (!fence) >>> return -EINVAL; >>> + >>> + ring = container_of(adev->mman.buffer_funcs_scheds[0], struct >>> amdgpu_ring, sched); >>> entity = &adev->mman.clear_entities[0]; >>> *fence = dma_fence_get_stub(); >>> @@ -2494,9 +2498,9 @@ int amdgpu_fill_buffer(struct >>> amdgpu_ttm_buffer_entity *entity, >>> u64 k_job_id) >>> { >>> struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); >>> - struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; >>> struct dma_fence *fence = NULL; >>> struct amdgpu_res_cursor dst; >>> + struct amdgpu_ring *ring; >>> int r, e; >>> if (!adev->mman.buffer_funcs_enabled) { >>> @@ -2505,6 +2509,8 @@ int amdgpu_fill_buffer(struct >>> amdgpu_ttm_buffer_entity *entity, >>> return -EINVAL; >>> } >>> + ring = container_of(adev->mman.buffer_funcs_scheds[0], struct >>> amdgpu_ring, sched); >>> + >>> if (entity == NULL) { >>> e = atomic_inc_return(&adev->mman.next_clear_entity) % >>> adev->mman.num_clear_entities; >>> @@ -2579,6 +2585,27 @@ int amdgpu_ttm_evict_resources(struct amdgpu_device >>> *adev, int mem_type) >>> return ttm_resource_manager_evict_all(&adev->mman.bdev, man); >>> } >>> +void amdgpu_sdma_set_buffer_funcs_scheds(struct amdgpu_device *adev, >>> + const struct amdgpu_buffer_funcs *buffer_funcs) >>> +{ >>> + struct amdgpu_vmhub *hub = &adev->vmhub[AMDGPU_GFXHUB(0)]; >>> + struct drm_gpu_scheduler *sched; >>> + int i; >>> + >>> + adev->mman.buffer_funcs = buffer_funcs; >>> + >>> + for (i = 0; i < adev->sdma.num_instances; i++) { >>> + if (adev->sdma.has_page_queue) >>> + sched = &adev->sdma.instance[i].page.sched; >>> + else >>> + sched = &adev->sdma.instance[i].ring.sched; >>> + adev->mman.buffer_funcs_scheds[i] = sched; >>> + } >>> + >>> + adev->mman.num_buffer_funcs_scheds = hub->sdma_invalidation_workaround >>> ? >>> + 1 : adev->sdma.num_instances; >>> +} >>> + >> >> Probably better to make all SDMA version switch to use >> amdgpu_sdma_set_buffer_funcs_scheds() one patch and then changing >> amdgpu_sdma_set_buffer_funcs_scheds() to use more than one DMA engine a >> second patch. > > I'm not sure it's useful: this patch simply creates an array of scheduler, > but every user of this array only use the first sched. > Enabling multiple schedulers usage is done in the "drm/amdgpu: give ttm > entities access to all the sdma scheds" patch. Oh, good point as well. Yeah than in that case please go ahead with what you have currently. Thanks, Christian. > > Pierre-Eric
