Teach xe_bo_move() to service a defrag move, where TTM reallocates a BO's backing at the device's beneficial order and hands the old, still populated tt to the move via the BO's defrag_old_tt.
A defrag move always has existing contents to copy, so force move_lacks_source false and fall through the XE_PL_TT -> XE_PL_TT no-op shortcut so the copy path runs. The relocation is performed entirely on the GPU via xe_migrate_copy_defrag(), reading the old pages from the stashed tt's sg table and writing the freshly reallocated backing. Both copy passes run in order on the same migrate queue and are captured by the returned fence, so the teardown is pipelined via ttm_bo_move_accel_cleanup(): the old tt is handed to a ghost object and only unpopulated and freed once the fence signals, so the move need not block. The VF CCS attach/detach handling is extended to treat a defrag move (ctx->defrag) like the XE_PL_SYSTEM <-> XE_PL_TT transitions it mirrors. Cc: Carlos Santa <[email protected]> Cc: Ryan Neph <[email protected]> Cc: Christian Koenig <[email protected]> Cc: Huang Rui <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: David Airlie <[email protected]> Cc: Simona Vetter <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Thomas Hellström <[email protected]> Assisted-by: GitHub_Copilot:claude-opus-4.8 Signed-off-by: Matthew Brost <[email protected]> --- drivers/gpu/drm/xe/xe_bo.c | 42 +++++++++++++++++++++++++++++++++----- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 3c185200419e..04709343518c 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1084,6 +1084,13 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, move_lacks_source = !old_mem || (handle_system_ccs ? (!bo->ccs_cleared) : (!mem_type_is_vram(old_mem_type) && !tt_has_data)); + /* + * A defrag move always copies the existing contents from the stashed + * old tt into the freshly (re)allocated backing. + */ + if (ttm_bo->defrag_old_tt) + move_lacks_source = false; + needs_clear = (ttm && ttm->page_flags & TTM_TT_FLAG_ZERO_ALLOC) || (!ttm && ttm_bo->type == ttm_bo_type_device); @@ -1120,9 +1127,12 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, /* * Failed multi-hop where the old_mem is still marked as * TTM_PL_FLAG_TEMPORARY, should just be a dummy move. + * + * For a defrag move the old and new tt differ, so fall through to the + * copy path instead of treating it as a no-op. */ if (old_mem_type == XE_PL_TT && - new_mem->mem_type == XE_PL_TT) { + new_mem->mem_type == XE_PL_TT && !ttm_bo->defrag_old_tt) { ttm_bo_move_null(ttm_bo, new_mem); goto out; } @@ -1193,6 +1203,12 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, flags |= XE_MIGRATE_CLEAR_FLAG_CCS_DATA; fence = xe_migrate_clear(migrate, bo, new_mem, flags); + } else if (ttm_bo->defrag_old_tt) { + struct xe_ttm_tt *old_xe_tt = + container_of(ttm_bo->defrag_old_tt, struct xe_ttm_tt, ttm); + + fence = xe_migrate_copy_defrag(migrate, bo, old_mem, new_mem, + old_xe_tt->sg, handle_system_ccs); } else { fence = xe_migrate_copy(migrate, bo, bo, old_mem, new_mem, handle_system_ccs); @@ -1202,7 +1218,23 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, xe_pm_runtime_put(xe); goto out; } - if (!move_lacks_source) { + if (ttm_bo->defrag_old_tt) { + /* + * The defrag move relocates the data (and the CCS aux state for + * compressed BOs) from the old backing to the new one entirely + * on the GPU, all captured by @fence. Pipeline the teardown via + * ttm_bo_move_accel_cleanup(): the old tt is handed to a ghost + * object and only unpopulated and freed once @fence signals, so + * the move need not block here. + */ + ret = ttm_bo_move_accel_cleanup(ttm_bo, fence, evict, + true, new_mem); + if (ret) { + dma_fence_wait(fence, false); + ttm_bo_move_null(ttm_bo, new_mem); + ret = 0; + } + } else if (!move_lacks_source) { ret = ttm_bo_move_accel_cleanup(ttm_bo, fence, evict, true, new_mem); if (ret) { @@ -1229,13 +1261,13 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict, * BBs from BO as it is no longer needed. */ if (IS_VF_CCS_READY(xe) && old_mem_type == XE_PL_TT && - new_mem->mem_type == XE_PL_SYSTEM) + (new_mem->mem_type == XE_PL_SYSTEM || ctx->defrag)) xe_sriov_vf_ccs_detach_bo(bo); if (IS_VF_CCS_READY(xe) && ((move_lacks_source && new_mem->mem_type == XE_PL_TT) || - (old_mem_type == XE_PL_SYSTEM && new_mem->mem_type == XE_PL_TT)) && - handle_system_ccs) + ((old_mem_type == XE_PL_SYSTEM || ctx->defrag) && + new_mem->mem_type == XE_PL_TT)) && handle_system_ccs) ret = xe_sriov_vf_ccs_attach_bo(bo); out: -- 2.34.1
