When writing a "max" limit lower than the current usage, the
existing code silently failed. This series aims to improve
on that by returning -EBUSY on failure and also attempt
to synchronously reclaim device memory to push the usage
under the new max limit to avoid the error.
Patch 1 fixes a pre-existing amdgpu_vram_mgr_init() error path
Patch 2 implements and documents a reclaim callback interface
for the dmem controller.
Patch 3 implements a TTM reclaim callback.
Patch 4-5 hooks up the reclaim callback to the dmem cgroups-
aware drivers xe and amdgpu.
v2:
- Remove the error propagation that was in a previous series (Maarten)
- A number of updates in patch 1. See its commit message for
details (Maarten)
v3:
- Add patch 1 fixing a pre-existing amdgpu_vram_mgr_init() error path
bug where drmm_cgroup_register_region() was called before
INIT_LIST_HEAD() and gpu_buddy_init(), causing a kernel panic on
failure. (Sashiko-bot)
- Use an rwsem to protect reclaim callback registration and region
unregister against concurrent reclaim invocations. (Sashiko-bot)
- Fix ttm_resource_manager_set_dmem_region() storing an error pointer
in man->cg unconditionally. (Sashiko-bot)
- Fix kernel-doc function name format for ttm_bo_evict_cgroup() and
ttm_resource_manager_set_dmem_region().
v4:
- Rebased on drm-tip; dropped the XE_PL_STOLEN guard in the xe patch
as stolen memory uses a separate TTM manager.
User-space tests are at
https://patchwork.freedesktop.org/series/163935/
Test-with: [email protected]
Thomas Hellström (5):
drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
cgroup/dmem: Add reclaim callback for lowering max below current usage
drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem
controller
drm/xe: Wire up dmem cgroup reclaim for VRAM manager
drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 10 +-
drivers/gpu/drm/ttm/ttm_bo.c | 95 ++++++++++++++++-
drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +-
drivers/gpu/drm/ttm/ttm_resource.c | 37 +++++++
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 14 ++-
include/drm/ttm/ttm_bo.h | 10 ++
include/drm/ttm/ttm_resource.h | 4 +
include/linux/cgroup_dmem.h | 24 +++++
kernel/cgroup/dmem.c | 106 +++++++++++++++++--
10 files changed, 283 insertions(+), 22 deletions(-)
--
2.54.0