[AMD Official Use Only - AMD Internal Distribution Only] Hi Christian,
Thank you for the feedback. For “return ret < 0 ? ret : 0;”, it is equivalent to “return ret;” since ret is always <= 0 after the loop. For all other comments, I will revise the patch accordingly in v2. Regards Sam From: Koenig, Christian <christian.koe...@amd.com> Date: Monday, June 30, 2025 at 19:54 To: Zhang, GuoQing (Sam) <guoqing.zh...@amd.com>, raf...@kernel.org <raf...@kernel.org>, len.br...@intel.com <len.br...@intel.com>, pa...@kernel.org <pa...@kernel.org>, Deucher, Alexander <alexander.deuc...@amd.com>, Limonciello, Mario <mario.limoncie...@amd.com>, Lazar, Lijo <lijo.la...@amd.com> Cc: Zhao, Victor <victor.z...@amd.com>, Chang, HaiJun <haijun.ch...@amd.com>, Ma, Qing (Mark) <qing...@amd.com>, amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>, dri-de...@lists.freedesktop.org <dri-de...@lists.freedesktop.org>, linux...@vger.kernel.org <linux...@vger.kernel.org>, linux-ker...@vger.kernel.org <linux-ker...@vger.kernel.org> Subject: Re: [PATCH 1/3] drm/amdgpu: move GTT to SHM after eviction for hibernation On 30.06.25 12:41, Samuel Zhang wrote: > When hibernate with data center dGPUs, huge number of VRAM BOs evicted > to GTT and takes too much system memory. This will cause hibernation > fail due to insufficient memory for creating the hibernation image. > > Move GTT BOs to shmem in KMD, then shmem to swap disk in kernel > hibernation code to make room for hibernation image. This should probably be two patches, one for TTM and then an amdgpu patch to forward the event. > > Signed-off-by: Samuel Zhang <guoqing.zh...@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 13 ++++++++++++- > drivers/gpu/drm/ttm/ttm_resource.c | 18 ++++++++++++++++++ > include/drm/ttm/ttm_resource.h | 1 + > 3 files changed, 31 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 4d57269c9ca8..5aede907a591 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -2889,6 +2889,7 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, > int amdgpu_ttm_evict_resources(struct amdgpu_device *adev, int mem_type) > { > struct ttm_resource_manager *man; > + int r; > > switch (mem_type) { > case TTM_PL_VRAM: > @@ -2903,7 +2904,17 @@ int amdgpu_ttm_evict_resources(struct amdgpu_device > *adev, int mem_type) > return -EINVAL; > } > > - return ttm_resource_manager_evict_all(&adev->mman.bdev, man); > + r = ttm_resource_manager_evict_all(&adev->mman.bdev, man); > + if (r) { > + DRM_ERROR("Failed to evict memory type %d\n", mem_type); > + return r; > + } > + if (adev->in_s4 && mem_type == TTM_PL_VRAM) { > + r = ttm_resource_manager_swapout(); > + if (r) > + DRM_ERROR("Failed to swap out, %d\n", r); > + } > + return r; > } > > #if defined(CONFIG_DEBUG_FS) > diff --git a/drivers/gpu/drm/ttm/ttm_resource.c > b/drivers/gpu/drm/ttm/ttm_resource.c > index fd41b56e2c66..07b1f5a5afc2 100644 > --- a/drivers/gpu/drm/ttm/ttm_resource.c > +++ b/drivers/gpu/drm/ttm/ttm_resource.c > @@ -534,6 +534,24 @@ void ttm_resource_manager_init(struct > ttm_resource_manager *man, > } > EXPORT_SYMBOL(ttm_resource_manager_init); > > +int ttm_resource_manager_swapout(void) This needs documentation, better placement and a better name. First of all put it into ttm_device.c instead of the resource manager. Then call it something like ttm_device_prepare_hibernation or similar. > +{ > + struct ttm_operation_ctx ctx = { > + .interruptible = false, > + .no_wait_gpu = false, > + .force_alloc = true > + }; > + int ret; > + > + while (true) { Make that: do { ret = ... } while (ret > 0); > + ret = ttm_global_swapout(&ctx, GFP_KERNEL); > + if (ret <= 0) > + break; > + } > + return ret; It's rather pointless to return the number of swapped out pages. Make that "return ret < 0 ? ret : 0; Regards, Christian. > +} > +EXPORT_SYMBOL(ttm_resource_manager_swapout); > + > /* > * ttm_resource_manager_evict_all > * > diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h > index b873be9597e2..46181758068e 100644 > --- a/include/drm/ttm/ttm_resource.h > +++ b/include/drm/ttm/ttm_resource.h > @@ -463,6 +463,7 @@ void ttm_resource_manager_init(struct > ttm_resource_manager *man, > > int ttm_resource_manager_evict_all(struct ttm_device *bdev, > struct ttm_resource_manager *man); > +int ttm_resource_manager_swapout(void); > > uint64_t ttm_resource_manager_usage(struct ttm_resource_manager *man); > void ttm_resource_manager_debug(struct ttm_resource_manager *man,