On Thu, 8 May 2025 at 08:03, Dave Airlie <airl...@gmail.com> wrote:
>
> On Thu, 8 May 2025 at 03:52, Johannes Weiner <han...@cmpxchg.org> wrote:
> >
> > Hello Dave,
> >
> > On Fri, May 02, 2025 at 01:35:59PM +1000, Dave Airlie wrote:
> > > Hey all,
> > >
> > > This is my second attempt at adding the initial simple memcg/ttm
> > > integration.
> > >
> > > This varies from the first attempt in two major ways:
> > >
> > > 1. Instead of using __GFP_ACCOUNT and direct calling kmem charges
> > > for pool memory, and directly hitting the GPU statistic, Waiman
> > > suggested I just do what the network socket stuff did, which looks
> > > simpler. So this adds two new memcg apis that wrap accounting.
> > > The pages no longer get assigned the memcg, it's owned by the
> > > larger BO object which makes more sense.
> >
> > Unfortunately, this was bad advice :(
> >
> > Naked-charging like this is quite awkward from the memcg side. It
> > requires consumer-specific charge paths in the memcg code, adds stat
> > counters that are memcg-only with no system-wide equivalent, and it's
> > difficult for the memcg maintainers to keep track of the link between
> > what's in the counter and the actual physical memory that is supposed
> > to be tracked.
> >
> > The network and a few others like it are rather begrudging exceptions
> > because they do not have a suitable page context or otherwise didn't
> > fit the charging scheme. They're not good examples to follow if it can
> > at all be avoided.
>
> I unfortunately feel GPU might fit in this category as well, we aren't
> allocating pages in the traditional manner, so we have a number of
> problems with doing it at the page level.
>
> I think the biggest barrier to doing page level tracking is with the
> page pools we have to keep. When we need CPU uncached pages, we
> allocate a bunch of pages and do the work to fix their cpu mappings to
> be uncached, this work is costly, so when we no longer need the
> uncached pages in an object, we return them to a pool rather than to
> the central allocator. We then use a shrinker to empty the pool and
> undo the cpu mapping change. We also do equivalent pools for dma able
> and 32-bit dma able pages if they are used.
>
> This means that if userspace allocates a large uncached object, and we
> account it to the current memcg with __GFP_ACCOUNT, then when it gets
> handed back to the pool we have to remove it from the memcg
> accounting. This was where I used the memcg kmem charge/uncharges,
> uncharge on adding to pool and charge on reuse. However kmem seems
> like the wrong place to charge, but it's where the initial
> __GFP_ACCOUNT will put those pages. This is why the suggestions to use
> the network solution worked better for me, I can do all the work
> outside the pool code at a slightly higher level (the same level where
> we charge/uncharge dmem), and I don't have to worry about handling the
> different pool types etc. We also don't need page->memcg to be set for
> these pages I don't think as all pages are part of a large object and
> the object is what gets swapped or freed, not parts of it.

With all that said we also manage moving objects to swap and maybe for
proper accounting of
swapping I need the more detailed integrated approach at the lower
levels, so objects that have
been swapped get removed from the gpu counter and added to the normal
swap counters.

Dave.

>
> >
> > __GFP_ACCOUNT and an enum node_stat_item is the much preferred way. I
> > have no objections to exports if you need to charge and account memory
> > from a module.
>
> Now maybe it makes sense to add a node_stat_item for this as well as
> the GPU memcg accounting so we can have it accounted in both places,
> right now mm has no insight statistics wise into this memory at all,
> other than it being pages allocated.
>
> Dave.

Reply via email to