This series adds a "page defragmentation" mechanism that lets a driver
re-back a buffer object with higher-order (beneficial-order) pages after
it was initially allocated with a sub-optimal, lower-order set of pages
under memory pressure. Long-lived objects that happened to be allocated
while memory was fragmented otherwise keep their scattered backing
forever, costing TLB efficiency for the lifetime of the object; this
series gives them a chance to be promoted back to the device's
beneficial order once memory is available again.

The TTM core grows the generic plumbing (order-failure tracking, a
defrag move that reallocates a populated BO in place, and a reclaim
backoff knob), and the Xe driver wires up the policy: it tracks the
affected BOs on a per-device list and runs a delayed worker that
periodically tries to defragment them on the GPU.

Overview
========

ttm core
--------
 - Record sub-optimal page-order allocations in ttm_tt. ttm_pool_alloc()
   sets ttm_tt::beneficial_order_failed when it has to fall back below
   the pool's beneficial order, so drivers can tell which BOs are backed
   with scattered pages.
 - Allow backing off reclaim at the beneficial order. A new
   ttm_operation_ctx::beneficial_reclaim_backoff flag makes the pool skip
   direct reclaim/compaction (and the __GFP_RETRY_MAYFAIL promotion) for
   the beneficial order, so a caller that knows beneficial-order
   allocations are currently failing doesn't burn CPU stalling for a
   contiguous page that isn't there.
 - Support defragmentation moves. ttm_operation_ctx::defrag forces a move
   that reallocates a populated BO's backing in place at the beneficial
   order, stashing the old, still-populated tt (bo->defrag_old_tt) and
   handing it to the driver move callback so the contents can be copied
   from the old pages to the new ones. The move is non-destructive: if
   beneficial-order pages can't be obtained the attempt aborts immediately
   and is fully unwound, with the BO restored to its original, still-mapped
   tt and placement.
 - Fault injection for beneficial-order allocation failures. A
   beneficial_order_fault_inject debugfs knob (mirroring
   backup_fault_inject) forces the beneficial-order allocation to fail, so
   the tracking and the driver defrag path can be exercised deterministically
   without driving the system into real fragmentation.

Xe driver
---------
 - Track BOs backed at a sub-optimal page order. ttm_bo_type_device BOs
   resident in XE_PL_TT with beneficial_order_failed set are kept on a
   per-device list (with an atomic count); membership is updated from the
   move path and the BO is dropped when pinned.
 - Back off beneficial-order reclaim under defrag pressure. When enough
   BOs are already pending defrag, set beneficial_reclaim_backoff so new
   allocations fall back quickly instead of stalling.
 - Add xe_migrate_copy_defrag() for on-GPU defrag copies. Relocates a BO's
   contents from the old pages to the freshly reallocated ones entirely on
   the GPU, in up to two passes: a verbatim data copy with the compression
   PAT cleared, then (for compressed BOs) an indirect -> indirect CCS aux
   copy with the compression PAT set. The two passes are required because
   the data copy and the indirect CCS access need opposite compression PAT
   settings on the same page mapping.
 - Handle defrag moves in xe_bo_move(). Detects bo->defrag_old_tt, runs the
   GPU copy from the stashed old tt's sg table, and pipelines the teardown
   via ttm_bo_move_accel_cleanup() so the old backing is unpopulated/freed
   only once the copy fence signals. The defrag move is fully pipelined with
   fences: rather than stalling, it inserts the copy into the GPU execution
   pipeline and lets the BO's normal fence dependencies order it against
   concurrent work.
 - Add a page defragmentation worker. A delayed worker walks the per-device
   list and attempts a defrag move on each BO. It is kicked when the list
   first becomes non-empty, caps the BOs processed per run
   (XE_BO_DEFRAG_NUM_BO_LIMIT_PER_WORK) to avoid starving active work,
   backs off exponentially on failure
   (XE_BO_DEFRAG_INTERVAL_MS .. XE_BO_DEFRAG_INTERVAL_MAX_MS), and stops
   once the list drains.
 - Observability and configuration: defrag GT stats, a DRM_XE_DEBUG_DEFRAG
   Kconfig option for debug logging, and Kconfig.profile options to tune the
   defrag thresholds/intervals.

Notes
=====
 - This series initially targets system/GTT backing (XE_PL_TT) only. The
   TTM defrag plumbing is placement-agnostic, so VRAM-backed BOs could be
   defragmented too with future driver-side work (a VRAM beneficial-order
   notion and the corresponding migrate paths).
 - Testing: memory was intentionally fragmented by a separate program at
   the moment a 3D benchmark launches, combined with the beneficial-order
   error injection. The benchmark's BOs are initially backed at a
   sub-optimal order and FPS starts lower; shortly after, the defrag worker
   kicks in and iteratively migrates the backing to the beneficial order,
   and FPS climbs to its peak.
 - The on-GPU two-pass copy, in particular the indirect -> indirect CCS aux
   copy across two distinct page sets, still wants wider hardware validation.
 - The TTM defrag move and the Xe policy are deliberately split so the core
   plumbing can be reviewed independently of the driver's heuristics.
 - The series without a version of [1] will not break the shrinker feedback
   loop in Xe

[1] https://patchwork.freedesktop.org/series/168651/

Cc: Carlos Santa <[email protected]>
Cc: Ryan Neph <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Matthew Auld <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Maxime Ripard <[email protected]>
Cc: Thomas Zimmermann <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Simona Vetter <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Thomas Hellström <[email protected]>
Signed-off-by: Matthew Brost <[email protected]>

Matthew Brost (12):
  drm/ttm/pool: Allow backing off reclaim at the beneficial order
  drm/ttm: Record sub-optimal page order allocations in ttm_tt
  drm/ttm: Support defragmentation moves
  drm/ttm: Add fault injection for beneficial-order allocation failures
  drm/xe: Track BOs backed at a sub-optimal page order
  drm/xe: Back off beneficial-order reclaim under defrag pressure
  drm/xe: Add xe_migrate_copy_defrag() for on-GPU defrag copies
  drm/xe: Handle defrag moves in xe_bo_move()
  drm/xe: Add a page defragmentation worker
  drm/xe: Add defrag GT stats
  drm/xe: Add DRM_XE_DEBUG_DEFRAG Kconfig option for debugging memory
    fragmentation
  drm/xe: Add Kconfig.profile options for BO defrag configuration

 drivers/gpu/drm/ttm/ttm_bo.c           |  83 +++++-
 drivers/gpu/drm/ttm/ttm_bo_util.c      |  16 +-
 drivers/gpu/drm/ttm/ttm_pool.c         |  49 +++-
 drivers/gpu/drm/ttm/ttm_tt.c           |   1 +
 drivers/gpu/drm/xe/Kconfig.debug       |  13 +
 drivers/gpu/drm/xe/Kconfig.profile     |  35 +++
 drivers/gpu/drm/xe/xe_bo.c             | 352 ++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_bo.h             |   2 +
 drivers/gpu/drm/xe/xe_bo_types.h       |   6 +
 drivers/gpu/drm/xe/xe_device.c         |   2 +
 drivers/gpu/drm/xe/xe_device_types.h   |  30 +++
 drivers/gpu/drm/xe/xe_gt_stats.c       |   3 +
 drivers/gpu/drm/xe/xe_gt_stats_types.h |  10 +
 drivers/gpu/drm/xe/xe_migrate.c        | 135 ++++++++--
 drivers/gpu/drm/xe/xe_migrate.h        |   8 +
 include/drm/ttm/ttm_bo.h               |  23 ++
 include/drm/ttm/ttm_tt.h               |   8 +
 17 files changed, 740 insertions(+), 36 deletions(-)

-- 
2.34.1

Reply via email to