>
> So in the GPU case, you'd charge on allocation, free objects into a
> cgroup-specific pool, and shrink using a cgroup-specific LRU
> list. Freed objects can be reused by this cgroup, but nobody else.
> They're reclaimed through memory pressure inside the cgroup, not due
> to the action of others. And all allocated memory is accounted for.
>
> I have to admit I'm pretty clueless about the gpu driver internals and
> can't really judge how feasible this is. But from a cgroup POV, if you
> want proper memory isolation between groups, it seems to me that's the
> direction you'd have to take this in.

I've been digging into this a bit today, to try and work out what
various paths forward might look like and run into a few impedance
mismatches.

1. TTM doesn't pool objects, it pools pages. TTM objects are varied in
size, we don't need to keep any sort of special allocator that we
would need if we cached sized objects (size buckets etc). list_lru
doesn't work on pages, if we were pooling the ttm objects I can see
being able to enable list_lru. But I'm seeing increased complexity for
no major return, but I might dig a bit more into whether caching
objects might help.

2. list_lru isn't suitable for pages, AFAICS we have to stick the page
into another object to store it in the list_lru, which would mean we'd
be allocating yet another wrapper object. Currently TTM uses the page
LRU pointer to add it to the shrinker_list, which is simple and low
overhead.

If we wanted to stick with keeping pages in the pool, I do feel moving
the pool code closer to the mm core and having some sort of more
tightly integrated reclaim to avoid the overheads. Now in an ideal
world we'd get a page flag like PG_uncached, and we can keep an
uncached inactive list per memcg/node and migrate pages off it, but I
don't think anyone is willing to give us a page flag for this, so I
think we do need to find a compromise that isn't ideal but works for
us now. I've also played a bit with the idea of MEMCG_LOWOVERHEAD
which adds a shrinker to start of shrinker list instead of end and
registering TTM pool shrinker as one of those.

Have I missed anything here that might make this easier?

Dave.

Reply via email to