On Sat, Apr 10, 2021 at 12:56 AM Johannes Weiner <han...@cmpxchg.org> wrote:
>
> On Fri, Apr 09, 2021 at 08:29:47PM +0800, Muchun Song wrote:
> > Because memory allocations pinning memcgs for a long time - it exists
> > at a larger scale and is causing recurring problems in the real world:
> > page cache doesn't get reclaimed for a long time, or is used by the
> > second, third, fourth, ... instance of the same job that was restarted
> > into a new cgroup every time. Unreclaimable dying cgroups pile up,
> > waste memory, and make page reclaim very inefficient.
> >
> > We can convert LRU pages and most other raw memcg pins to the objcg
> > direction to fix this problem, and then the page->memcg will always
> > point to an object cgroup pointer.
> >
> > Therefore, the infrastructure of objcg no longer only serves
> > CONFIG_MEMCG_KMEM. In this patch, we move the infrastructure of the
> > objcg out of the scope of the CONFIG_MEMCG_KMEM so that the LRU pages
> > can reuse it to charge pages.
>
> Just an observation on this:
>
> We actually may want to remove CONFIG_MEMCG_KMEM altogether at this
> point. It used to be an optional feature, but nowadays it's not
> configurable anymore, and always on unless slob is configured.
>
> We've also added more than just slab accounting to it, like kernel
> stack pages, and it all gets disabled on slob configs just because it
> doesn't support slab object tracking.
>
> We could probably replace CONFIG_MEMCG_KMEM with CONFIG_MEMCG in most
> places, and add a couple of !CONFIG_SLOB checks in the slab callbacks.
>
> But that's beyond the scope of your patch series, so I'm also okay
> with this patch here.
>
> > We know that the LRU pages are not accounted at the root level. But the
> > page->memcg_data points to the root_mem_cgroup. So the page->memcg_data
> > of the LRU pages always points to a valid pointer. But the root_mem_cgroup
> > dose not have an object cgroup. If we use obj_cgroup APIs to charge the
> > LRU pages, we should set the page->memcg_data to a root object cgroup. So
> > we also allocate an object cgroup for the root_mem_cgroup and introduce
> > root_obj_cgroup to cache its value just like root_mem_cgroup.
> >
> > Signed-off-by: Muchun Song <songmuc...@bytedance.com>
>
> Overall, the patch makes sense to me. A few comments below:
>
> > @@ -252,9 +253,14 @@ struct cgroup_subsys_state *vmpressure_to_css(struct 
> > vmpressure *vmpr)
> >       return &container_of(vmpr, struct mem_cgroup, vmpressure)->css;
> >  }
> >
> > -#ifdef CONFIG_MEMCG_KMEM
> >  extern spinlock_t css_set_lock;
> >
> > +static inline bool obj_cgroup_is_root(struct obj_cgroup *objcg)
> > +{
>   > +   return objcg == root_obj_cgroup;
> > +}
>
> This function, and by extension root_obj_cgroup, aren't used by this
> patch. Please move them to the patch that adds users for them.

OK. Will do.

>
> > @@ -298,6 +304,20 @@ static void obj_cgroup_release(struct percpu_ref *ref)
> >       percpu_ref_exit(ref);
> >       kfree_rcu(objcg, rcu);
> >  }
> > +#else
> > +static void obj_cgroup_release(struct percpu_ref *ref)
> > +{
> > +     struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, 
> > refcnt);
> > +     unsigned long flags;
> > +
> > +     spin_lock_irqsave(&css_set_lock, flags);
> > +     list_del(&objcg->list);
> > +     spin_unlock_irqrestore(&css_set_lock, flags);
> > +
> > +     percpu_ref_exit(ref);
> > +     kfree_rcu(objcg, rcu);
> > +}
> > +#endif
>
> Having two separate functions for if and else is good when the else
> branch is a completely empty dummy function. In this case you end up
> duplicating code, so it's better to have just one function and put the
> ifdef around the nr_charged_bytes handling in it.

Make sense. I will rework the code here.

>
> > @@ -318,10 +338,14 @@ static struct obj_cgroup *obj_cgroup_alloc(void)
> >       return objcg;
> >  }
> >
> > -static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
> > -                               struct mem_cgroup *parent)
> > +static void memcg_reparent_objcgs(struct mem_cgroup *memcg)
> >  {
> >       struct obj_cgroup *objcg, *iter;
> > +     struct mem_cgroup *parent;
> > +
> > +     parent = parent_mem_cgroup(memcg);
> > +     if (!parent)
> > +             parent = root_mem_cgroup;
> >
> >       objcg = rcu_replace_pointer(memcg->objcg, NULL, true);
> >
> > @@ -342,6 +366,27 @@ static void memcg_reparent_objcgs(struct mem_cgroup 
> > *memcg,
> >       percpu_ref_kill(&objcg->refcnt);
> >  }
> >
> > +static int memcg_obj_cgroup_alloc(struct mem_cgroup *memcg)
> > +{
> > +     struct obj_cgroup *objcg;
> > +
> > +     objcg = obj_cgroup_alloc();
> > +     if (!objcg)
> > +             return -ENOMEM;
> > +
> > +     objcg->memcg = memcg;
> > +     rcu_assign_pointer(memcg->objcg, objcg);
> > +
> > +     return 0;
> > +}
> > +
> > +static void memcg_obj_cgroup_free(struct mem_cgroup *memcg)
> > +{
> > +     if (unlikely(memcg->objcg))
> > +             memcg_reparent_objcgs(memcg);
> > +}
>
> It's confusing to have a 'free' function not free the object it's
> called on.
>
> But rather than search for a fitting name, I think it might be better
> to just fold both of these short functions into their only callsites.

OK. Will do.

>
> Also, since memcg->objcg is reparented, and the pointer cleared, on
> offlining, when could this ever be non-NULL? This deserves a comment.

css_alloc() failed, offlining didn't happen. In this case, memcg->objcg
could be non-NULL (Just like memcg_free_kmem() dose). I will move
memcg_obj_cgroup_alloc() to the mem_cgroup_css_online() so that
we do not need memcg_obj_cgroup_free.

>
> > @@ -3444,7 +3489,6 @@ static u64 mem_cgroup_read_u64(struct 
> > cgroup_subsys_state *css,
> >  #ifdef CONFIG_MEMCG_KMEM
> >  static int memcg_online_kmem(struct mem_cgroup *memcg)
> >  {
> > -     struct obj_cgroup *objcg;
> >       int memcg_id;
> >
> >       if (cgroup_memory_nokmem)
> > @@ -3457,14 +3501,6 @@ static int memcg_online_kmem(struct mem_cgroup 
> > *memcg)
> >       if (memcg_id < 0)
> >               return memcg_id;
> >
> > -     objcg = obj_cgroup_alloc();
> > -     if (!objcg) {
> > -             memcg_free_cache_id(memcg_id);
> > -             return -ENOMEM;
> > -     }
> > -     objcg->memcg = memcg;
> > -     rcu_assign_pointer(memcg->objcg, objcg);
> > -
> >       static_branch_enable(&memcg_kmem_enabled_key);
> >
> >       memcg->kmemcg_id = memcg_id;
> > @@ -3488,7 +3524,7 @@ static void memcg_offline_kmem(struct mem_cgroup 
> > *memcg)
> >       if (!parent)
> >               parent = root_mem_cgroup;
> >
> > -     memcg_reparent_objcgs(memcg, parent);
> > +     memcg_reparent_objcgs(memcg);
>
> Since the objcg is no longer tied to kmem, this should move to
> mem_cgroup_css_offline() instead.

LGTM, will do.

Thanks for your all suggestions.

Reply via email to