On 2020-07-22 09:46, Daniel Vetter wrote:
On Wed, Jul 22, 2020 at 9:19 AM Christian König
<christian.koe...@amd.com> wrote:
Am 22.07.20 um 02:22 schrieb Gurchetan Singh:
Of the desktop GPU drivers, i915's shrinker certainly supports purging
to swap.  TTM is a bit hard to follow.  I can't really tell if amdgpu
or nouveau supports that.  virtio-gpu is more commonly found on
systems with swaps so I think it should follow the desktop practices?

What we do at least in the amdgpu, radeon, i915 and nouveau is to only allow it 
for scanout and that in turn is limited by the physical number of CRTCs on the 
board.
Somewhat aside, but I'm not sure the ttm shrinker really works like
that. I think there's two parts:
1. kernel thread which takes buffers and unbinds them when we're over
the ttm global limit. This is the ttm_shrink_work stuff, and it only
shrinks if the zone is over a hard limit. Below that it just leaves
buffers pinned.

2. Actual core mm shrinker, which releases buffers held in cache by
ttm_page_alloc_dma.c. But that only happens when buffers have been
unbound by the first thread, so anything below those hard limits is
not shrinkable. And iirc those hard limits are like half of system
memory or so (last time I looked through this stuff at least).

No idea why exactly things are like they are, since the first thread
already does a dma_resv_trylock, and that's enough to avoid locking
inversions when being called from 2. Or well, should be at least, for
reasonable driver design.

The only other thing I'm seeing is the global lru, but that could be
fixed by having a per-device core mm shrinker instance which directly
shrinks the per-device lru. And then we just globally balance like
with all shrinkers through the core mm "shrink everyone equally"
approach. You can even keep the separate page alloc shrinker, since
core mm always loops over all shrinkers - we're not the only ones
where shrinking one cache makes more memory available for another
cache to shrink, e.g. you can't throw out an inode without first
throwing out all the dentry pointing at them.

Another problem would be allocating memory while holding per-device
lru locks (since trylock on such a global lock in shrinkers is a
really bad idea, we know that from all the dev->struct_mutex lolz in
i915). But for ttm that's not a problem since all lru are spinlock, so
only GFP_ATOMIC allowed anyway, hence no problem.

Adding Thomas for this ttm tangent.
-Daniel

Hmm, so yes the TTM avoidance of 'DOS by pinning' mechanism is not really optimal but nobody has really cared to improve it yet.

TTM allows a certain amount of memory to be pinned across all TTM drivers in a system. That should include kmalloced memory for graphics objects, intended at that time to stop attacks like guessing a gem name and calling gem open repeatedly to pin all available kmalloc memory. Buffer object pages are also typically considered "pinned, but unpinnable" by this accounting. When the hard limit is reached, there is a direct reclaim by unbinding unpinnable pages from graphics, and copying the contents to shmem, on which the shrinkers then may operate. There is also a lower soft limit above which a separate kernel thread copies memory to shmem. These limits are run-time configurable.

There are also a couple of page pools that were added to TTM to cache unused device-coherent pages and uncached / write-combined pages. Since allocating such memory may be painfully slow. These page pools have their own shrinkers.

/Thomas


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to