This series adds a "page defragmentation" mechanism that lets a driver re-back a buffer object with higher-order (beneficial-order) pages after it was initially allocated with a sub-optimal, lower-order set of pages under memory pressure. Long-lived objects that happened to be allocated while memory was fragmented otherwise keep their scattered backing forever, costing TLB efficiency for the lifetime of the object; this series gives them a chance to be promoted back to the device's beneficial order once memory is available again.
The TTM core grows the generic plumbing (order-failure tracking, a defrag move that reallocates a populated BO in place, and a reclaim backoff knob), and the Xe driver wires up the policy: it tracks the affected BOs on a per-device list and runs a delayed worker that periodically tries to defragment them on the GPU. Overview ======== ttm core -------- - Record sub-optimal page-order allocations in ttm_tt. ttm_pool_alloc() sets ttm_tt::beneficial_order_failed when it has to fall back below the pool's beneficial order, so drivers can tell which BOs are backed with scattered pages. - Allow backing off reclaim at the beneficial order. A new ttm_operation_ctx::beneficial_reclaim_backoff flag makes the pool skip direct reclaim/compaction (and the __GFP_RETRY_MAYFAIL promotion) for the beneficial order, so a caller that knows beneficial-order allocations are currently failing doesn't burn CPU stalling for a contiguous page that isn't there. - Support defragmentation moves. ttm_operation_ctx::defrag forces a move that reallocates a populated BO's backing in place at the beneficial order, stashing the old, still-populated tt (bo->defrag_old_tt) and handing it to the driver move callback so the contents can be copied from the old pages to the new ones. The move is non-destructive: if beneficial-order pages can't be obtained the attempt aborts immediately and is fully unwound, with the BO restored to its original, still-mapped tt and placement. - Fault injection for beneficial-order allocation failures. A beneficial_order_fault_inject debugfs knob (mirroring backup_fault_inject) forces the beneficial-order allocation to fail, so the tracking and the driver defrag path can be exercised deterministically without driving the system into real fragmentation. Xe driver --------- - Track BOs backed at a sub-optimal page order. ttm_bo_type_device BOs resident in XE_PL_TT with beneficial_order_failed set are kept on a per-device list (with an atomic count); membership is updated from the move path and the BO is dropped when pinned. - Back off beneficial-order reclaim under defrag pressure. When enough BOs are already pending defrag, set beneficial_reclaim_backoff so new allocations fall back quickly instead of stalling. - Add xe_migrate_copy_defrag() for on-GPU defrag copies. Relocates a BO's contents from the old pages to the freshly reallocated ones entirely on the GPU, in up to two passes: a verbatim data copy with the compression PAT cleared, then (for compressed BOs) an indirect -> indirect CCS aux copy with the compression PAT set. The two passes are required because the data copy and the indirect CCS access need opposite compression PAT settings on the same page mapping. - Handle defrag moves in xe_bo_move(). Detects bo->defrag_old_tt, runs the GPU copy from the stashed old tt's sg table, and pipelines the teardown via ttm_bo_move_accel_cleanup() so the old backing is unpopulated/freed only once the copy fence signals. The defrag move is fully pipelined with fences: rather than stalling, it inserts the copy into the GPU execution pipeline and lets the BO's normal fence dependencies order it against concurrent work. - Add a page defragmentation worker. A delayed worker walks the per-device list and attempts a defrag move on each BO. It is kicked when the list first becomes non-empty, caps the BOs processed per run (XE_BO_DEFRAG_NUM_BO_LIMIT_PER_WORK) to avoid starving active work, backs off exponentially on failure (XE_BO_DEFRAG_INTERVAL_MS .. XE_BO_DEFRAG_INTERVAL_MAX_MS), and stops once the list drains. - Observability and configuration: defrag GT stats, a DRM_XE_DEBUG_DEFRAG Kconfig option for debug logging, and Kconfig.profile options to tune the defrag thresholds/intervals. Notes ===== - This series initially targets system/GTT backing (XE_PL_TT) only. The TTM defrag plumbing is placement-agnostic, so VRAM-backed BOs could be defragmented too with future driver-side work (a VRAM beneficial-order notion and the corresponding migrate paths). - Testing: memory was intentionally fragmented by a separate program at the moment a 3D benchmark launches, combined with the beneficial-order error injection. The benchmark's BOs are initially backed at a sub-optimal order and FPS starts lower; shortly after, the defrag worker kicks in and iteratively migrates the backing to the beneficial order, and FPS climbs to its peak. - The on-GPU two-pass copy, in particular the indirect -> indirect CCS aux copy across two distinct page sets, still wants wider hardware validation. - The TTM defrag move and the Xe policy are deliberately split so the core plumbing can be reviewed independently of the driver's heuristics. - The series without a version of [1] will not break the shrinker feedback loop in Xe [1] https://patchwork.freedesktop.org/series/168651/ Cc: Carlos Santa <[email protected]> Cc: Ryan Neph <[email protected]> Cc: Christian Koenig <[email protected]> Cc: Huang Rui <[email protected]> Cc: Matthew Auld <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Maxime Ripard <[email protected]> Cc: Thomas Zimmermann <[email protected]> Cc: David Airlie <[email protected]> Cc: Simona Vetter <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Thomas Hellström <[email protected]> Signed-off-by: Matthew Brost <[email protected]> Matthew Brost (12): drm/ttm/pool: Allow backing off reclaim at the beneficial order drm/ttm: Record sub-optimal page order allocations in ttm_tt drm/ttm: Support defragmentation moves drm/ttm: Add fault injection for beneficial-order allocation failures drm/xe: Track BOs backed at a sub-optimal page order drm/xe: Back off beneficial-order reclaim under defrag pressure drm/xe: Add xe_migrate_copy_defrag() for on-GPU defrag copies drm/xe: Handle defrag moves in xe_bo_move() drm/xe: Add a page defragmentation worker drm/xe: Add defrag GT stats drm/xe: Add DRM_XE_DEBUG_DEFRAG Kconfig option for debugging memory fragmentation drm/xe: Add Kconfig.profile options for BO defrag configuration drivers/gpu/drm/ttm/ttm_bo.c | 83 +++++- drivers/gpu/drm/ttm/ttm_bo_util.c | 16 +- drivers/gpu/drm/ttm/ttm_pool.c | 49 +++- drivers/gpu/drm/ttm/ttm_tt.c | 1 + drivers/gpu/drm/xe/Kconfig.debug | 13 + drivers/gpu/drm/xe/Kconfig.profile | 35 +++ drivers/gpu/drm/xe/xe_bo.c | 352 ++++++++++++++++++++++++- drivers/gpu/drm/xe/xe_bo.h | 2 + drivers/gpu/drm/xe/xe_bo_types.h | 6 + drivers/gpu/drm/xe/xe_device.c | 2 + drivers/gpu/drm/xe/xe_device_types.h | 30 +++ drivers/gpu/drm/xe/xe_gt_stats.c | 3 + drivers/gpu/drm/xe/xe_gt_stats_types.h | 10 + drivers/gpu/drm/xe/xe_migrate.c | 135 ++++++++-- drivers/gpu/drm/xe/xe_migrate.h | 8 + include/drm/ttm/ttm_bo.h | 23 ++ include/drm/ttm/ttm_tt.h | 8 + 17 files changed, 740 insertions(+), 36 deletions(-) -- 2.34.1
