On Mon 01-08-16 16:13:08, Vladimir Davydov wrote:
> Radix trees may be used not only for storing page cache pages, so
> unconditionally accounting radix tree nodes to the current memory cgroup
> is bad: if a radix tree node is used for storing data shared among
> different cgroups we risk pinning dead memory cgroups forever. So let's
> only account radix tree nodes if it was explicitly requested by passing
> __GFP_ACCOUNT to INIT_RADIX_TREE. Currently, we only want to account
> page cache entries, so mark mapping->page_tree so.
> 
> Signed-off-by: Vladimir Davydov <[email protected]>

OK, the patch makes sense to me. Such a false sharing would be really
tedious to debug

Do we want to mark it for stable 4.6 to prevent from some pathological
issues. The patch is simple enough.

Acked-by: Michal Hocko <[email protected]>

> ---
>  fs/inode.c       |  2 +-
>  lib/radix-tree.c | 14 ++++++++++----
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 559a9da25237..1d04dab5211c 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -345,7 +345,7 @@ EXPORT_SYMBOL(inc_nlink);
>  void address_space_init_once(struct address_space *mapping)
>  {
>       memset(mapping, 0, sizeof(*mapping));
> -     INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC);
> +     INIT_RADIX_TREE(&mapping->page_tree, GFP_ATOMIC | __GFP_ACCOUNT);
>       spin_lock_init(&mapping->tree_lock);
>       init_rwsem(&mapping->i_mmap_rwsem);
>       INIT_LIST_HEAD(&mapping->private_list);
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 61b8fb529cef..1b7bf7314141 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -277,10 +277,11 @@ radix_tree_node_alloc(struct radix_tree_root *root)
>  
>               /*
>                * Even if the caller has preloaded, try to allocate from the
> -              * cache first for the new node to get accounted.
> +              * cache first for the new node to get accounted to the memory
> +              * cgroup.
>                */
>               ret = kmem_cache_alloc(radix_tree_node_cachep,
> -                                    gfp_mask | __GFP_ACCOUNT | __GFP_NOWARN);
> +                                    gfp_mask | __GFP_NOWARN);
>               if (ret)
>                       goto out;
>  
> @@ -303,8 +304,7 @@ radix_tree_node_alloc(struct radix_tree_root *root)
>               kmemleak_update_trace(ret);
>               goto out;
>       }
> -     ret = kmem_cache_alloc(radix_tree_node_cachep,
> -                            gfp_mask | __GFP_ACCOUNT);
> +     ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
>  out:
>       BUG_ON(radix_tree_is_internal_node(ret));
>       return ret;
> @@ -351,6 +351,12 @@ static int __radix_tree_preload(gfp_t gfp_mask, int nr)
>       struct radix_tree_node *node;
>       int ret = -ENOMEM;
>  
> +     /*
> +      * Nodes preloaded by one cgroup can be be used by another cgroup, so
> +      * they should never be accounted to any particular memory cgroup.
> +      */
> +     gfp_mask &= ~__GFP_ACCOUNT;
> +
>       preempt_disable();
>       rtp = this_cpu_ptr(&radix_tree_preloads);
>       while (rtp->nr < nr) {
> -- 
> 2.1.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected].  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]";> [email protected] </a>

-- 
Michal Hocko
SUSE Labs

Reply via email to