Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-23 Thread Suzuki K. Poulose

On 22/01/15 13:45, Johannes Weiner wrote:

On Wed, Jan 21, 2015 at 04:39:55PM +, Will Deacon wrote:

On Mon, Jan 19, 2015 at 12:51:27PM +, Suzuki K. Poulose wrote:

On 10/01/15 08:55, Vladimir Davydov wrote:

The problem is that the memory cgroup controller takes a css reference
per each charged page and does not reparent charged pages on css
offline, while cgroup_mount/cgroup_kill_sb expect all css references to
offline cgroups to be gone soon, restarting the syscall if the ref count
!= 0. As a result, if you create a memory cgroup, charge some page cache
to it, and then remove it, unmount/mount will hang forever.

May be, we should kill the ref counter to the memory controller root in
cgroup_kill_sb only if there is no children at all, neither online nor
offline.



Still reproducible on 3.19-rc5 with the same setup.


Yeah, I'm seeing the same failure on my setup too.


 From git bisect, the last good commit is :

commit 8df0c2dcf61781d2efa8e6e5b06870f6c6785735
Author: Pranith Kumar 
Date:   Wed Dec 10 15:42:28 2014 -0800

  slab: replace smp_read_barrier_depends() with lockless_dereference()


So that points at 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
as the offending commit.


With b2052564e66d ("mm: memcontrol: continue cache reclaim from
offlined groups"), page cache can pin an old css and its ancestors
indefinitely, making that hang in a second mount() very likely.

However, swap entries have also been doing that for quite a while now,
and as Vladimir pointed out, the same is true for kernel memory.  This
latest change just makes this existing bug easier to trigger.

I think we have to update the lifetime rules to reflect reality here:
memory and swap lifetime is indefinite, so once the memory controller
is used, it has state that is independent from whether its mounted or
not.  We can support an identical remount, but have to fail mounting
with new parameters that would change the behavior of the controller.

Suzuki, Will, could you give the following patch a shot?





Tejun, would that route be acceptable to you?

Thanks

---
 From c5e88d02d185c52748df664aa30a2c5f8949b0f7 Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Thu, 22 Jan 2015 08:16:31 -0500
Subject: [patch] kernel: cgroup: prevent mount hang due to memory controller
  lifetime





Don't offline the controller root as long as there are any children,
dead or alive.  A remount will no longer wait for these old references
to drain, it will simply mount the persistent controller state again.

Reported-by: "Suzuki K. Poulose" 
Reported-by: Will Deacon 
Signed-off-by: Johannes Weiner 

This one fixes the issue.

Tested-by : Suzuki K. Poulose 

Thanks
Suzuki



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-23 Thread Suzuki K. Poulose

On 22/01/15 13:45, Johannes Weiner wrote:

On Wed, Jan 21, 2015 at 04:39:55PM +, Will Deacon wrote:

On Mon, Jan 19, 2015 at 12:51:27PM +, Suzuki K. Poulose wrote:

On 10/01/15 08:55, Vladimir Davydov wrote:

The problem is that the memory cgroup controller takes a css reference
per each charged page and does not reparent charged pages on css
offline, while cgroup_mount/cgroup_kill_sb expect all css references to
offline cgroups to be gone soon, restarting the syscall if the ref count
!= 0. As a result, if you create a memory cgroup, charge some page cache
to it, and then remove it, unmount/mount will hang forever.

May be, we should kill the ref counter to the memory controller root in
cgroup_kill_sb only if there is no children at all, neither online nor
offline.



Still reproducible on 3.19-rc5 with the same setup.


Yeah, I'm seeing the same failure on my setup too.


 From git bisect, the last good commit is :

commit 8df0c2dcf61781d2efa8e6e5b06870f6c6785735
Author: Pranith Kumar bobby.pr...@gmail.com
Date:   Wed Dec 10 15:42:28 2014 -0800

  slab: replace smp_read_barrier_depends() with lockless_dereference()


So that points at 3e32cb2e0a12 (mm: memcontrol: lockless page counters)
as the offending commit.


With b2052564e66d (mm: memcontrol: continue cache reclaim from
offlined groups), page cache can pin an old css and its ancestors
indefinitely, making that hang in a second mount() very likely.

However, swap entries have also been doing that for quite a while now,
and as Vladimir pointed out, the same is true for kernel memory.  This
latest change just makes this existing bug easier to trigger.

I think we have to update the lifetime rules to reflect reality here:
memory and swap lifetime is indefinite, so once the memory controller
is used, it has state that is independent from whether its mounted or
not.  We can support an identical remount, but have to fail mounting
with new parameters that would change the behavior of the controller.

Suzuki, Will, could you give the following patch a shot?





Tejun, would that route be acceptable to you?

Thanks

---
 From c5e88d02d185c52748df664aa30a2c5f8949b0f7 Mon Sep 17 00:00:00 2001
From: Johannes Weiner han...@cmpxchg.org
Date: Thu, 22 Jan 2015 08:16:31 -0500
Subject: [patch] kernel: cgroup: prevent mount hang due to memory controller
  lifetime





Don't offline the controller root as long as there are any children,
dead or alive.  A remount will no longer wait for these old references
to drain, it will simply mount the persistent controller state again.

Reported-by: Suzuki K. Poulose suzuki.poul...@arm.com
Reported-by: Will Deacon will.dea...@arm.com
Signed-off-by: Johannes Weiner han...@cmpxchg.org

This one fixes the issue.

Tested-by : Suzuki K. Poulose suzuki.poul...@arm.com

Thanks
Suzuki



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-22 Thread Tejun Heo
On Thu, Jan 22, 2015 at 10:19:43AM -0500, Johannes Weiner wrote:
> From 3d7ae5aeb16ce6118d8bff17194e791339a1f06c Mon Sep 17 00:00:00 2001
> From: Johannes Weiner 
> Date: Thu, 22 Jan 2015 08:16:31 -0500
> Subject: [patch] kernel: cgroup: prevent mount hang due to memory controller
>  lifetime
> 
> Since b2052564e66d ("mm: memcontrol: continue cache reclaim from
> offlined groups"), re-mounting the memory controller after using it is
> very likely to hang.
> 
> The cgroup core assumes that any remaining references after deleting a
> cgroup are temporary in nature, and synchroneously waits for them, but
> the above-mentioned commit has left-over page cache pin its css until
> it is reclaimed naturally.  That being said, swap entries and charged
> kernel memory have been doing the same indefinite pinning forever, the
> bug is just more likely to trigger with left-over page cache.
> 
> Reparenting kernel memory is highly impractical, which leaves changing
> the cgroup assumptions to reflect this: once a controller has been
> mounted and used, it has internal state that is independent from mount
> and cgroup lifetime.  It can be unmounted and remounted, but it can't
> be reconfigured during subsequent mounts.
> 
> Don't offline the controller root as long as there are any children,
> dead or alive.  A remount will no longer wait for these old references
> to drain, it will simply mount the persistent controller state again.
> 
> Reported-by: "Suzuki K. Poulose" 
> Reported-by: Will Deacon 
> Signed-off-by: Johannes Weiner 

Applied to cgroup/for-3.19-fixes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-22 Thread Johannes Weiner
Hi,

On Thu, Jan 22, 2015 at 09:34:54AM -0500, Tejun Heo wrote:
> On Thu, Jan 22, 2015 at 08:45:50AM -0500, Johannes Weiner wrote:
> > diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> > index bb263d0caab3..9a09308c8066 100644
> > --- a/kernel/cgroup.c
> > +++ b/kernel/cgroup.c
> > @@ -1819,8 +1819,11 @@ static struct dentry *cgroup_mount(struct 
> > file_system_type *fs_type,
> > goto out_unlock;
> > }
> >  
> > -   if (root->flags ^ opts.flags)
> > -   pr_warn("new mount options do not match the existing 
> > superblock, will be ignored\n");
> > +   if (root->flags ^ opts.flags) {
> > +   pr_warn("new mount options do not match the existing 
> > superblock\n");
> > +   ret = -EBUSY;
> > +   goto out_unlock;
> > +   }
> 
> Do we really need the above chunk?

Inform and ignore or fail hard?  I guess we can drop this hunk and
keep with the current behavior.

> > @@ -1909,7 +1912,7 @@ static void cgroup_kill_sb(struct super_block *sb)
> >  *
> >  * And don't kill the default root.
> >  */
> > -   if (css_has_online_children(>cgrp.self) ||
> > +   if (!list_empty(>cgrp.self.children) ||
> > root == _dfl_root)
> > cgroup_put(>cgrp);
> 
> I tried to do something a bit more advanced so that eventual async
> release of dying children, if they happen, can also release the
> hierarchy but I don't think it really matters unless we can forcefully
> drain.  So, shouldn't just the above part be enough?

Yep, I'd be fine with that.

---

>From 3d7ae5aeb16ce6118d8bff17194e791339a1f06c Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Thu, 22 Jan 2015 08:16:31 -0500
Subject: [patch] kernel: cgroup: prevent mount hang due to memory controller
 lifetime

Since b2052564e66d ("mm: memcontrol: continue cache reclaim from
offlined groups"), re-mounting the memory controller after using it is
very likely to hang.

The cgroup core assumes that any remaining references after deleting a
cgroup are temporary in nature, and synchroneously waits for them, but
the above-mentioned commit has left-over page cache pin its css until
it is reclaimed naturally.  That being said, swap entries and charged
kernel memory have been doing the same indefinite pinning forever, the
bug is just more likely to trigger with left-over page cache.

Reparenting kernel memory is highly impractical, which leaves changing
the cgroup assumptions to reflect this: once a controller has been
mounted and used, it has internal state that is independent from mount
and cgroup lifetime.  It can be unmounted and remounted, but it can't
be reconfigured during subsequent mounts.

Don't offline the controller root as long as there are any children,
dead or alive.  A remount will no longer wait for these old references
to drain, it will simply mount the persistent controller state again.

Reported-by: "Suzuki K. Poulose" 
Reported-by: Will Deacon 
Signed-off-by: Johannes Weiner 
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bb263d0caab3..04cfe8ace520 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1909,7 +1909,7 @@ static void cgroup_kill_sb(struct super_block *sb)
 *
 * And don't kill the default root.
 */
-   if (css_has_online_children(>cgrp.self) ||
+   if (!list_empty(>cgrp.self.children) ||
root == _dfl_root)
cgroup_put(>cgrp);
else
-- 
2.2.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-22 Thread Tejun Heo
Hello,

On Thu, Jan 22, 2015 at 08:45:50AM -0500, Johannes Weiner wrote:
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index bb263d0caab3..9a09308c8066 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1819,8 +1819,11 @@ static struct dentry *cgroup_mount(struct 
> file_system_type *fs_type,
>   goto out_unlock;
>   }
>  
> - if (root->flags ^ opts.flags)
> - pr_warn("new mount options do not match the existing 
> superblock, will be ignored\n");
> + if (root->flags ^ opts.flags) {
> + pr_warn("new mount options do not match the existing 
> superblock\n");
> + ret = -EBUSY;
> + goto out_unlock;
> + }

Do we really need the above chunk?

> @@ -1909,7 +1912,7 @@ static void cgroup_kill_sb(struct super_block *sb)
>*
>* And don't kill the default root.
>*/
> - if (css_has_online_children(>cgrp.self) ||
> + if (!list_empty(>cgrp.self.children) ||
>   root == _dfl_root)
>   cgroup_put(>cgrp);

I tried to do something a bit more advanced so that eventual async
release of dying children, if they happen, can also release the
hierarchy but I don't think it really matters unless we can forcefully
drain.  So, shouldn't just the above part be enough?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-22 Thread Johannes Weiner
On Wed, Jan 21, 2015 at 04:39:55PM +, Will Deacon wrote:
> On Mon, Jan 19, 2015 at 12:51:27PM +, Suzuki K. Poulose wrote:
> > On 10/01/15 08:55, Vladimir Davydov wrote:
> > > The problem is that the memory cgroup controller takes a css reference
> > > per each charged page and does not reparent charged pages on css
> > > offline, while cgroup_mount/cgroup_kill_sb expect all css references to
> > > offline cgroups to be gone soon, restarting the syscall if the ref count
> > > != 0. As a result, if you create a memory cgroup, charge some page cache
> > > to it, and then remove it, unmount/mount will hang forever.
> > >
> > > May be, we should kill the ref counter to the memory controller root in
> > > cgroup_kill_sb only if there is no children at all, neither online nor
> > > offline.
> > >
> > 
> > Still reproducible on 3.19-rc5 with the same setup.
> 
> Yeah, I'm seeing the same failure on my setup too.
> 
> > From git bisect, the last good commit is :
> > 
> > commit 8df0c2dcf61781d2efa8e6e5b06870f6c6785735
> > Author: Pranith Kumar 
> > Date:   Wed Dec 10 15:42:28 2014 -0800
> > 
> >  slab: replace smp_read_barrier_depends() with lockless_dereference()
> 
> So that points at 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
> as the offending commit.

With b2052564e66d ("mm: memcontrol: continue cache reclaim from
offlined groups"), page cache can pin an old css and its ancestors
indefinitely, making that hang in a second mount() very likely.

However, swap entries have also been doing that for quite a while now,
and as Vladimir pointed out, the same is true for kernel memory.  This
latest change just makes this existing bug easier to trigger.

I think we have to update the lifetime rules to reflect reality here:
memory and swap lifetime is indefinite, so once the memory controller
is used, it has state that is independent from whether its mounted or
not.  We can support an identical remount, but have to fail mounting
with new parameters that would change the behavior of the controller.

Suzuki, Will, could you give the following patch a shot?

Tejun, would that route be acceptable to you?

Thanks

---
>From c5e88d02d185c52748df664aa30a2c5f8949b0f7 Mon Sep 17 00:00:00 2001
From: Johannes Weiner 
Date: Thu, 22 Jan 2015 08:16:31 -0500
Subject: [patch] kernel: cgroup: prevent mount hang due to memory controller
 lifetime

Since b2052564e66d ("mm: memcontrol: continue cache reclaim from
offlined groups"), re-mounting the memory controller after using it is
very likely to hang.

The cgroup core assumes that any remaining references after deleting a
cgroup are temporary in nature, and synchroneously waits for them, but
the above-mentioned commit has left-over page cache pin its css until
it is reclaimed naturally.  That being said, swap entries and charged
kernel memory have been doing the same indefinite pinning forever, the
bug is just more likely to trigger with left-over page cache.

Reparenting kernel memory is highly impractical, which leaves changing
the cgroup assumptions to reflect this: once a controller has been
mounted and used, it has internal state that is independent from mount
and cgroup lifetime.  It can be unmounted and remounted, but it can't
be reconfigured during subsequent mounts.

Don't offline the controller root as long as there are any children,
dead or alive.  A remount will no longer wait for these old references
to drain, it will simply mount the persistent controller state again.

Reported-by: "Suzuki K. Poulose" 
Reported-by: Will Deacon 
Signed-off-by: Johannes Weiner 
---
 kernel/cgroup.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bb263d0caab3..9a09308c8066 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1819,8 +1819,11 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
goto out_unlock;
}
 
-   if (root->flags ^ opts.flags)
-   pr_warn("new mount options do not match the existing 
superblock, will be ignored\n");
+   if (root->flags ^ opts.flags) {
+   pr_warn("new mount options do not match the existing 
superblock\n");
+   ret = -EBUSY;
+   goto out_unlock;
+   }
 
/*
 * We want to reuse @root whose lifetime is governed by its
@@ -1909,7 +1912,7 @@ static void cgroup_kill_sb(struct super_block *sb)
 *
 * And don't kill the default root.
 */
-   if (css_has_online_children(>cgrp.self) ||
+   if (!list_empty(>cgrp.self.children) ||
root == _dfl_root)
cgroup_put(>cgrp);
else
-- 
2.2.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-22 Thread Johannes Weiner
On Wed, Jan 21, 2015 at 04:39:55PM +, Will Deacon wrote:
 On Mon, Jan 19, 2015 at 12:51:27PM +, Suzuki K. Poulose wrote:
  On 10/01/15 08:55, Vladimir Davydov wrote:
   The problem is that the memory cgroup controller takes a css reference
   per each charged page and does not reparent charged pages on css
   offline, while cgroup_mount/cgroup_kill_sb expect all css references to
   offline cgroups to be gone soon, restarting the syscall if the ref count
   != 0. As a result, if you create a memory cgroup, charge some page cache
   to it, and then remove it, unmount/mount will hang forever.
  
   May be, we should kill the ref counter to the memory controller root in
   cgroup_kill_sb only if there is no children at all, neither online nor
   offline.
  
  
  Still reproducible on 3.19-rc5 with the same setup.
 
 Yeah, I'm seeing the same failure on my setup too.
 
  From git bisect, the last good commit is :
  
  commit 8df0c2dcf61781d2efa8e6e5b06870f6c6785735
  Author: Pranith Kumar bobby.pr...@gmail.com
  Date:   Wed Dec 10 15:42:28 2014 -0800
  
   slab: replace smp_read_barrier_depends() with lockless_dereference()
 
 So that points at 3e32cb2e0a12 (mm: memcontrol: lockless page counters)
 as the offending commit.

With b2052564e66d (mm: memcontrol: continue cache reclaim from
offlined groups), page cache can pin an old css and its ancestors
indefinitely, making that hang in a second mount() very likely.

However, swap entries have also been doing that for quite a while now,
and as Vladimir pointed out, the same is true for kernel memory.  This
latest change just makes this existing bug easier to trigger.

I think we have to update the lifetime rules to reflect reality here:
memory and swap lifetime is indefinite, so once the memory controller
is used, it has state that is independent from whether its mounted or
not.  We can support an identical remount, but have to fail mounting
with new parameters that would change the behavior of the controller.

Suzuki, Will, could you give the following patch a shot?

Tejun, would that route be acceptable to you?

Thanks

---
From c5e88d02d185c52748df664aa30a2c5f8949b0f7 Mon Sep 17 00:00:00 2001
From: Johannes Weiner han...@cmpxchg.org
Date: Thu, 22 Jan 2015 08:16:31 -0500
Subject: [patch] kernel: cgroup: prevent mount hang due to memory controller
 lifetime

Since b2052564e66d (mm: memcontrol: continue cache reclaim from
offlined groups), re-mounting the memory controller after using it is
very likely to hang.

The cgroup core assumes that any remaining references after deleting a
cgroup are temporary in nature, and synchroneously waits for them, but
the above-mentioned commit has left-over page cache pin its css until
it is reclaimed naturally.  That being said, swap entries and charged
kernel memory have been doing the same indefinite pinning forever, the
bug is just more likely to trigger with left-over page cache.

Reparenting kernel memory is highly impractical, which leaves changing
the cgroup assumptions to reflect this: once a controller has been
mounted and used, it has internal state that is independent from mount
and cgroup lifetime.  It can be unmounted and remounted, but it can't
be reconfigured during subsequent mounts.

Don't offline the controller root as long as there are any children,
dead or alive.  A remount will no longer wait for these old references
to drain, it will simply mount the persistent controller state again.

Reported-by: Suzuki K. Poulose suzuki.poul...@arm.com
Reported-by: Will Deacon will.dea...@arm.com
Signed-off-by: Johannes Weiner han...@cmpxchg.org
---
 kernel/cgroup.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bb263d0caab3..9a09308c8066 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1819,8 +1819,11 @@ static struct dentry *cgroup_mount(struct 
file_system_type *fs_type,
goto out_unlock;
}
 
-   if (root-flags ^ opts.flags)
-   pr_warn(new mount options do not match the existing 
superblock, will be ignored\n);
+   if (root-flags ^ opts.flags) {
+   pr_warn(new mount options do not match the existing 
superblock\n);
+   ret = -EBUSY;
+   goto out_unlock;
+   }
 
/*
 * We want to reuse @root whose lifetime is governed by its
@@ -1909,7 +1912,7 @@ static void cgroup_kill_sb(struct super_block *sb)
 *
 * And don't kill the default root.
 */
-   if (css_has_online_children(root-cgrp.self) ||
+   if (!list_empty(root-cgrp.self.children) ||
root == cgrp_dfl_root)
cgroup_put(root-cgrp);
else
-- 
2.2.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-22 Thread Tejun Heo
On Thu, Jan 22, 2015 at 10:19:43AM -0500, Johannes Weiner wrote:
 From 3d7ae5aeb16ce6118d8bff17194e791339a1f06c Mon Sep 17 00:00:00 2001
 From: Johannes Weiner han...@cmpxchg.org
 Date: Thu, 22 Jan 2015 08:16:31 -0500
 Subject: [patch] kernel: cgroup: prevent mount hang due to memory controller
  lifetime
 
 Since b2052564e66d (mm: memcontrol: continue cache reclaim from
 offlined groups), re-mounting the memory controller after using it is
 very likely to hang.
 
 The cgroup core assumes that any remaining references after deleting a
 cgroup are temporary in nature, and synchroneously waits for them, but
 the above-mentioned commit has left-over page cache pin its css until
 it is reclaimed naturally.  That being said, swap entries and charged
 kernel memory have been doing the same indefinite pinning forever, the
 bug is just more likely to trigger with left-over page cache.
 
 Reparenting kernel memory is highly impractical, which leaves changing
 the cgroup assumptions to reflect this: once a controller has been
 mounted and used, it has internal state that is independent from mount
 and cgroup lifetime.  It can be unmounted and remounted, but it can't
 be reconfigured during subsequent mounts.
 
 Don't offline the controller root as long as there are any children,
 dead or alive.  A remount will no longer wait for these old references
 to drain, it will simply mount the persistent controller state again.
 
 Reported-by: Suzuki K. Poulose suzuki.poul...@arm.com
 Reported-by: Will Deacon will.dea...@arm.com
 Signed-off-by: Johannes Weiner han...@cmpxchg.org

Applied to cgroup/for-3.19-fixes.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-22 Thread Tejun Heo
Hello,

On Thu, Jan 22, 2015 at 08:45:50AM -0500, Johannes Weiner wrote:
 diff --git a/kernel/cgroup.c b/kernel/cgroup.c
 index bb263d0caab3..9a09308c8066 100644
 --- a/kernel/cgroup.c
 +++ b/kernel/cgroup.c
 @@ -1819,8 +1819,11 @@ static struct dentry *cgroup_mount(struct 
 file_system_type *fs_type,
   goto out_unlock;
   }
  
 - if (root-flags ^ opts.flags)
 - pr_warn(new mount options do not match the existing 
 superblock, will be ignored\n);
 + if (root-flags ^ opts.flags) {
 + pr_warn(new mount options do not match the existing 
 superblock\n);
 + ret = -EBUSY;
 + goto out_unlock;
 + }

Do we really need the above chunk?

 @@ -1909,7 +1912,7 @@ static void cgroup_kill_sb(struct super_block *sb)
*
* And don't kill the default root.
*/
 - if (css_has_online_children(root-cgrp.self) ||
 + if (!list_empty(root-cgrp.self.children) ||
   root == cgrp_dfl_root)
   cgroup_put(root-cgrp);

I tried to do something a bit more advanced so that eventual async
release of dying children, if they happen, can also release the
hierarchy but I don't think it really matters unless we can forcefully
drain.  So, shouldn't just the above part be enough?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-22 Thread Johannes Weiner
Hi,

On Thu, Jan 22, 2015 at 09:34:54AM -0500, Tejun Heo wrote:
 On Thu, Jan 22, 2015 at 08:45:50AM -0500, Johannes Weiner wrote:
  diff --git a/kernel/cgroup.c b/kernel/cgroup.c
  index bb263d0caab3..9a09308c8066 100644
  --- a/kernel/cgroup.c
  +++ b/kernel/cgroup.c
  @@ -1819,8 +1819,11 @@ static struct dentry *cgroup_mount(struct 
  file_system_type *fs_type,
  goto out_unlock;
  }
   
  -   if (root-flags ^ opts.flags)
  -   pr_warn(new mount options do not match the existing 
  superblock, will be ignored\n);
  +   if (root-flags ^ opts.flags) {
  +   pr_warn(new mount options do not match the existing 
  superblock\n);
  +   ret = -EBUSY;
  +   goto out_unlock;
  +   }
 
 Do we really need the above chunk?

Inform and ignore or fail hard?  I guess we can drop this hunk and
keep with the current behavior.

  @@ -1909,7 +1912,7 @@ static void cgroup_kill_sb(struct super_block *sb)
   *
   * And don't kill the default root.
   */
  -   if (css_has_online_children(root-cgrp.self) ||
  +   if (!list_empty(root-cgrp.self.children) ||
  root == cgrp_dfl_root)
  cgroup_put(root-cgrp);
 
 I tried to do something a bit more advanced so that eventual async
 release of dying children, if they happen, can also release the
 hierarchy but I don't think it really matters unless we can forcefully
 drain.  So, shouldn't just the above part be enough?

Yep, I'd be fine with that.

---

From 3d7ae5aeb16ce6118d8bff17194e791339a1f06c Mon Sep 17 00:00:00 2001
From: Johannes Weiner han...@cmpxchg.org
Date: Thu, 22 Jan 2015 08:16:31 -0500
Subject: [patch] kernel: cgroup: prevent mount hang due to memory controller
 lifetime

Since b2052564e66d (mm: memcontrol: continue cache reclaim from
offlined groups), re-mounting the memory controller after using it is
very likely to hang.

The cgroup core assumes that any remaining references after deleting a
cgroup are temporary in nature, and synchroneously waits for them, but
the above-mentioned commit has left-over page cache pin its css until
it is reclaimed naturally.  That being said, swap entries and charged
kernel memory have been doing the same indefinite pinning forever, the
bug is just more likely to trigger with left-over page cache.

Reparenting kernel memory is highly impractical, which leaves changing
the cgroup assumptions to reflect this: once a controller has been
mounted and used, it has internal state that is independent from mount
and cgroup lifetime.  It can be unmounted and remounted, but it can't
be reconfigured during subsequent mounts.

Don't offline the controller root as long as there are any children,
dead or alive.  A remount will no longer wait for these old references
to drain, it will simply mount the persistent controller state again.

Reported-by: Suzuki K. Poulose suzuki.poul...@arm.com
Reported-by: Will Deacon will.dea...@arm.com
Signed-off-by: Johannes Weiner han...@cmpxchg.org
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bb263d0caab3..04cfe8ace520 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1909,7 +1909,7 @@ static void cgroup_kill_sb(struct super_block *sb)
 *
 * And don't kill the default root.
 */
-   if (css_has_online_children(root-cgrp.self) ||
+   if (!list_empty(root-cgrp.self.children) ||
root == cgrp_dfl_root)
cgroup_put(root-cgrp);
else
-- 
2.2.0
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-21 Thread Will Deacon
On Mon, Jan 19, 2015 at 12:51:27PM +, Suzuki K. Poulose wrote:
> On 10/01/15 08:55, Vladimir Davydov wrote:
> > The problem is that the memory cgroup controller takes a css reference
> > per each charged page and does not reparent charged pages on css
> > offline, while cgroup_mount/cgroup_kill_sb expect all css references to
> > offline cgroups to be gone soon, restarting the syscall if the ref count
> > != 0. As a result, if you create a memory cgroup, charge some page cache
> > to it, and then remove it, unmount/mount will hang forever.
> >
> > May be, we should kill the ref counter to the memory controller root in
> > cgroup_kill_sb only if there is no children at all, neither online nor
> > offline.
> >
> 
> Still reproducible on 3.19-rc5 with the same setup.

Yeah, I'm seeing the same failure on my setup too.

> From git bisect, the last good commit is :
> 
> commit 8df0c2dcf61781d2efa8e6e5b06870f6c6785735
> Author: Pranith Kumar 
> Date:   Wed Dec 10 15:42:28 2014 -0800
> 
>  slab: replace smp_read_barrier_depends() with lockless_dereference()

So that points at 3e32cb2e0a12 ("mm: memcontrol: lockless page counters")
as the offending commit.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-21 Thread Will Deacon
On Mon, Jan 19, 2015 at 12:51:27PM +, Suzuki K. Poulose wrote:
 On 10/01/15 08:55, Vladimir Davydov wrote:
  The problem is that the memory cgroup controller takes a css reference
  per each charged page and does not reparent charged pages on css
  offline, while cgroup_mount/cgroup_kill_sb expect all css references to
  offline cgroups to be gone soon, restarting the syscall if the ref count
  != 0. As a result, if you create a memory cgroup, charge some page cache
  to it, and then remove it, unmount/mount will hang forever.
 
  May be, we should kill the ref counter to the memory controller root in
  cgroup_kill_sb only if there is no children at all, neither online nor
  offline.
 
 
 Still reproducible on 3.19-rc5 with the same setup.

Yeah, I'm seeing the same failure on my setup too.

 From git bisect, the last good commit is :
 
 commit 8df0c2dcf61781d2efa8e6e5b06870f6c6785735
 Author: Pranith Kumar bobby.pr...@gmail.com
 Date:   Wed Dec 10 15:42:28 2014 -0800
 
  slab: replace smp_read_barrier_depends() with lockless_dereference()

So that points at 3e32cb2e0a12 (mm: memcontrol: lockless page counters)
as the offending commit.

Will
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-19 Thread Suzuki K. Poulose

On 10/01/15 08:55, Vladimir Davydov wrote:

On Fri, Jan 09, 2015 at 05:43:17PM +, Suzuki K. Poulose wrote:

Hi

We have hit a hang on ARM64 defconfig, while running LTP tests on
3.19-rc3. We are
in the process of a git bisect and will update the results as and
when we find the commit.

During the ksm ltp run, the test hangs trying to mount memcg with
the following strace
output:

mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ?
ERESTARTNOINTR (To be restarted)
mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ?
ERESTARTNOINTR (To be restarted)
[ ... repeated forever ... ]

At this point, one can try mounting the memcg to verify the problem.
# mount -t cgroup -o memory memcg memcg_dir
--hangs--

Strangely, if we run the mount command from a cold boot (i.e.
without running LTP first),
then it succeeds.

Upon a quick look we are hitting the following code :
kernel/cgroup.c: cgroup_mount() :

1779 for_each_subsys(ss, i) {
1780 if (!(opts.subsys_mask & (1 << i)) ||
1781 ss->root == _dfl_root)
1782 continue;
1783
1784 if
(!percpu_ref_tryget_live(>root->cgrp.self.refcnt)) {
1785 mutex_unlock(_mutex);
1786 msleep(10);
1787 ret = restart_syscall(); <=
1788 goto out_free;
1789 }
1790 cgroup_put(>root->cgrp);
1791 }

with ss->root->cgrp.self.refct.percpu_count_ptr == __PERCPU_REF_ATOMIC_DEAD

Any ideas?


The problem is that the memory cgroup controller takes a css reference
per each charged page and does not reparent charged pages on css
offline, while cgroup_mount/cgroup_kill_sb expect all css references to
offline cgroups to be gone soon, restarting the syscall if the ref count
!= 0. As a result, if you create a memory cgroup, charge some page cache
to it, and then remove it, unmount/mount will hang forever.

May be, we should kill the ref counter to the memory controller root in
cgroup_kill_sb only if there is no children at all, neither online nor
offline.



Still reproducible on 3.19-rc5 with the same setup. From git bisect, the 
last good commit is :


commit 8df0c2dcf61781d2efa8e6e5b06870f6c6785735
Author: Pranith Kumar 
Date:   Wed Dec 10 15:42:28 2014 -0800

slab: replace smp_read_barrier_depends() with lockless_dereference()



Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-19 Thread Suzuki K. Poulose

On 10/01/15 08:55, Vladimir Davydov wrote:

On Fri, Jan 09, 2015 at 05:43:17PM +, Suzuki K. Poulose wrote:

Hi

We have hit a hang on ARM64 defconfig, while running LTP tests on
3.19-rc3. We are
in the process of a git bisect and will update the results as and
when we find the commit.

During the ksm ltp run, the test hangs trying to mount memcg with
the following strace
output:

mount(memcg, /dev/cgroup, cgroup, 0, memory) = ?
ERESTARTNOINTR (To be restarted)
mount(memcg, /dev/cgroup, cgroup, 0, memory) = ?
ERESTARTNOINTR (To be restarted)
[ ... repeated forever ... ]

At this point, one can try mounting the memcg to verify the problem.
# mount -t cgroup -o memory memcg memcg_dir
--hangs--

Strangely, if we run the mount command from a cold boot (i.e.
without running LTP first),
then it succeeds.

Upon a quick look we are hitting the following code :
kernel/cgroup.c: cgroup_mount() :

1779 for_each_subsys(ss, i) {
1780 if (!(opts.subsys_mask  (1  i)) ||
1781 ss-root == cgrp_dfl_root)
1782 continue;
1783
1784 if
(!percpu_ref_tryget_live(ss-root-cgrp.self.refcnt)) {
1785 mutex_unlock(cgroup_mutex);
1786 msleep(10);
1787 ret = restart_syscall(); =
1788 goto out_free;
1789 }
1790 cgroup_put(ss-root-cgrp);
1791 }

with ss-root-cgrp.self.refct.percpu_count_ptr == __PERCPU_REF_ATOMIC_DEAD

Any ideas?


The problem is that the memory cgroup controller takes a css reference
per each charged page and does not reparent charged pages on css
offline, while cgroup_mount/cgroup_kill_sb expect all css references to
offline cgroups to be gone soon, restarting the syscall if the ref count
!= 0. As a result, if you create a memory cgroup, charge some page cache
to it, and then remove it, unmount/mount will hang forever.

May be, we should kill the ref counter to the memory controller root in
cgroup_kill_sb only if there is no children at all, neither online nor
offline.



Still reproducible on 3.19-rc5 with the same setup. From git bisect, the 
last good commit is :


commit 8df0c2dcf61781d2efa8e6e5b06870f6c6785735
Author: Pranith Kumar bobby.pr...@gmail.com
Date:   Wed Dec 10 15:42:28 2014 -0800

slab: replace smp_read_barrier_depends() with lockless_dereference()



Thanks
Suzuki

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-12 Thread Suzuki K. Poulose
On Fri, Jan 09, 2015 at 09:46:49PM +, Tejun Heo wrote:
> On Fri, Jan 09, 2015 at 05:43:17PM +, Suzuki K. Poulose wrote:
> > We have hit a hang on ARM64 defconfig, while running LTP tests on 3.19-rc3.
> > We are
> > in the process of a git bisect and will update the results as and
> > when we find the commit.
> >
> > During the ksm ltp run, the test hangs trying to mount memcg with the
> > following strace
> > output:
> >
> > mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ? ERESTARTNOINTR (To
> > be restarted)
> > mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ? ERESTARTNOINTR (To
> > be restarted)
> > [ ... repeated forever ... ]
> >
> > At this point, one can try mounting the memcg to verify the problem.
> > # mount -t cgroup -o memory memcg memcg_dir
> > --hangs--
> >
> > Strangely, if we run the mount command from a cold boot (i.e. without
> > running LTP first),
> > then it succeeds.
>
> I don't know what LTP is doing and this could actually be hitting on
> an actual bug but if it's trying to move memcg back from unified
> hierarchy to an old one, that might hang - it should prolly made to
> just fail at that point.  Anyways, any chance you can find out what
> happened, in terms of cgroup mounting, to memcg upto that point?
>

This is what the test(ksm03) does, roughly from strace :

faccessat(AT_FDCWD, "/sys/kernel/mm/ksm/", F_OK) = 0
faccessat(AT_FDCWD, "/sys/kernel/mm/ksm/merge_across_nodes", F_OK) = -1 ENOENT 
(No such file or directory)
mkdirat(AT_FDCWD, "/dev/cgroup", 0777)  = 0
mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = 0

--- set memory limit. Create a new set /dev/cgroups/1 and moves test to that 
group ---
mkdirat(AT_FDCWD, "/dev/cgroup/1", 0777) = 0
openat(AT_FDCWD, "/dev/cgroup/1/memory.limit_in_bytes", 
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_dev=makedev(0, 24), st_ino=41, st_mode=S_IFREG|0644, st_nlink=1, 
st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, 
st_atime=2015/01/12-15:10:13, st_mtime=2015/01/12-15:10:13, 
st_ctime=2015/01/12-15:10:13}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fb2903000
write(3, "1073741824", 10)  = 10
close(3)= 0
munmap(0x7fb2903000, 65536) = 0
getpid()= 1324
openat(AT_FDCWD, "/dev/cgroup/1/tasks", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_dev=makedev(0, 24), st_ino=37, st_mode=S_IFREG|0644, st_nlink=1, 
st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, 
st_atime=2015/01/12-15:10:13, st_mtime=2015/01/12-15:10:13, 
st_ctime=2015/01/12-15:10:13}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fb2903000
write(3, "1324", 4) = 4
close(3)= 0
munmap(0x7fb2903000, 65536) = 0

clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7fb2a7f0d0) = 1325
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7fb2a7f0d0) = 1326
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7fb2a7f0d0) = 1327

--- Creates 3 children, perform a lot of memory operations with shared pages
verify the ksm for activity and wait for children to exit ---

wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED|WCONTINUED, NULL) = 
1325
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED|WCONTINUED, NULL) = 
1326
wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WSTOPPED|WCONTINUED, NULL) = 
1327
wait4(-1, 0x7fe5625f3c, WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child 
processes)

--- cleanup: Move tasks under /dev/cgroups/1/ to /dev/cgroups/ and delete 
subdir, umount cgroup ---

faccessat(AT_FDCWD, "/sys/kernel/mm/ksm/merge_across_nodes", F_OK) = -1 ENOENT 
(No such file or directory)
openat(AT_FDCWD, "/dev/cgroup/tasks", O_WRONLY) = 205
openat(AT_FDCWD, "/dev/cgroup/1/tasks", O_RDONLY) = 206
fstat(206, {st_dev=makedev(0, 24), st_ino=37, st_mode=S_IFREG|0644, st_nlink=1, 
st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, 
st_atime=2015/01/12-15:10:13, st_mtime=2015/01/12-15:10:13, 
st_ctime=2015/01/12-15:10:13}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fb1c53000
read(206, "1324\n", 4096)   = 5
write(205, "1324", 4)   = 4
read(206, "", 4096) = 0
close(205)  = 0
close(206)  = 0
munmap(0x7fb1c53000, 65536) = 0
unlinkat(AT_FDCWD, "/dev/cgroup/1", AT_REMOVEDIR) = 0
umount2("/dev/cgroup", 0)   = 0
unlinkat(AT_FDCWD, "/dev/cgroup", AT_REMOVEDIR) = 0
exit_group(0)   = ?


The next invocation of the same test fails to mount the cgroup memory.

Thanks
Suzuki

> Thanks.
>
> --
> tejun
>

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential 

Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-12 Thread Suzuki K. Poulose
On Fri, Jan 09, 2015 at 09:46:49PM +, Tejun Heo wrote:
 On Fri, Jan 09, 2015 at 05:43:17PM +, Suzuki K. Poulose wrote:
  We have hit a hang on ARM64 defconfig, while running LTP tests on 3.19-rc3.
  We are
  in the process of a git bisect and will update the results as and
  when we find the commit.
 
  During the ksm ltp run, the test hangs trying to mount memcg with the
  following strace
  output:
 
  mount(memcg, /dev/cgroup, cgroup, 0, memory) = ? ERESTARTNOINTR (To
  be restarted)
  mount(memcg, /dev/cgroup, cgroup, 0, memory) = ? ERESTARTNOINTR (To
  be restarted)
  [ ... repeated forever ... ]
 
  At this point, one can try mounting the memcg to verify the problem.
  # mount -t cgroup -o memory memcg memcg_dir
  --hangs--
 
  Strangely, if we run the mount command from a cold boot (i.e. without
  running LTP first),
  then it succeeds.

 I don't know what LTP is doing and this could actually be hitting on
 an actual bug but if it's trying to move memcg back from unified
 hierarchy to an old one, that might hang - it should prolly made to
 just fail at that point.  Anyways, any chance you can find out what
 happened, in terms of cgroup mounting, to memcg upto that point?


This is what the test(ksm03) does, roughly from strace :

faccessat(AT_FDCWD, /sys/kernel/mm/ksm/, F_OK) = 0
faccessat(AT_FDCWD, /sys/kernel/mm/ksm/merge_across_nodes, F_OK) = -1 ENOENT 
(No such file or directory)
mkdirat(AT_FDCWD, /dev/cgroup, 0777)  = 0
mount(memcg, /dev/cgroup, cgroup, 0, memory) = 0

--- set memory limit. Create a new set /dev/cgroups/1 and moves test to that 
group ---
mkdirat(AT_FDCWD, /dev/cgroup/1, 0777) = 0
openat(AT_FDCWD, /dev/cgroup/1/memory.limit_in_bytes, 
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_dev=makedev(0, 24), st_ino=41, st_mode=S_IFREG|0644, st_nlink=1, 
st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, 
st_atime=2015/01/12-15:10:13, st_mtime=2015/01/12-15:10:13, 
st_ctime=2015/01/12-15:10:13}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fb2903000
write(3, 1073741824, 10)  = 10
close(3)= 0
munmap(0x7fb2903000, 65536) = 0
getpid()= 1324
openat(AT_FDCWD, /dev/cgroup/1/tasks, O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fstat(3, {st_dev=makedev(0, 24), st_ino=37, st_mode=S_IFREG|0644, st_nlink=1, 
st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, 
st_atime=2015/01/12-15:10:13, st_mtime=2015/01/12-15:10:13, 
st_ctime=2015/01/12-15:10:13}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fb2903000
write(3, 1324, 4) = 4
close(3)= 0
munmap(0x7fb2903000, 65536) = 0

clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7fb2a7f0d0) = 1325
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7fb2a7f0d0) = 1326
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, 
child_tidptr=0x7fb2a7f0d0) = 1327

--- Creates 3 children, perform a lot of memory operations with shared pages
verify the ksm for activity and wait for children to exit ---

wait4(-1, [{WIFEXITED(s)  WEXITSTATUS(s) == 0}], WSTOPPED|WCONTINUED, NULL) = 
1325
wait4(-1, [{WIFEXITED(s)  WEXITSTATUS(s) == 0}], WSTOPPED|WCONTINUED, NULL) = 
1326
wait4(-1, [{WIFEXITED(s)  WEXITSTATUS(s) == 0}], WSTOPPED|WCONTINUED, NULL) = 
1327
wait4(-1, 0x7fe5625f3c, WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child 
processes)

--- cleanup: Move tasks under /dev/cgroups/1/ to /dev/cgroups/ and delete 
subdir, umount cgroup ---

faccessat(AT_FDCWD, /sys/kernel/mm/ksm/merge_across_nodes, F_OK) = -1 ENOENT 
(No such file or directory)
openat(AT_FDCWD, /dev/cgroup/tasks, O_WRONLY) = 205
openat(AT_FDCWD, /dev/cgroup/1/tasks, O_RDONLY) = 206
fstat(206, {st_dev=makedev(0, 24), st_ino=37, st_mode=S_IFREG|0644, st_nlink=1, 
st_uid=0, st_gid=0, st_blksize=4096, st_blocks=0, st_size=0, 
st_atime=2015/01/12-15:10:13, st_mtime=2015/01/12-15:10:13, 
st_ctime=2015/01/12-15:10:13}) = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fb1c53000
read(206, 1324\n, 4096)   = 5
write(205, 1324, 4)   = 4
read(206, , 4096) = 0
close(205)  = 0
close(206)  = 0
munmap(0x7fb1c53000, 65536) = 0
unlinkat(AT_FDCWD, /dev/cgroup/1, AT_REMOVEDIR) = 0
umount2(/dev/cgroup, 0)   = 0
unlinkat(AT_FDCWD, /dev/cgroup, AT_REMOVEDIR) = 0
exit_group(0)   = ?


The next invocation of the same test fails to mount the cgroup memory.

Thanks
Suzuki

 Thanks.

 --
 tejun


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not 

Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-10 Thread Vladimir Davydov
On Fri, Jan 09, 2015 at 05:43:17PM +, Suzuki K. Poulose wrote:
> Hi
> 
> We have hit a hang on ARM64 defconfig, while running LTP tests on
> 3.19-rc3. We are
> in the process of a git bisect and will update the results as and
> when we find the commit.
> 
> During the ksm ltp run, the test hangs trying to mount memcg with
> the following strace
> output:
> 
> mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ?
> ERESTARTNOINTR (To be restarted)
> mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ?
> ERESTARTNOINTR (To be restarted)
> [ ... repeated forever ... ]
> 
> At this point, one can try mounting the memcg to verify the problem.
> # mount -t cgroup -o memory memcg memcg_dir
> --hangs--
> 
> Strangely, if we run the mount command from a cold boot (i.e.
> without running LTP first),
> then it succeeds.
> 
> Upon a quick look we are hitting the following code :
> kernel/cgroup.c: cgroup_mount() :
> 
> 1779 for_each_subsys(ss, i) {
> 1780 if (!(opts.subsys_mask & (1 << i)) ||
> 1781 ss->root == _dfl_root)
> 1782 continue;
> 1783
> 1784 if
> (!percpu_ref_tryget_live(>root->cgrp.self.refcnt)) {
> 1785 mutex_unlock(_mutex);
> 1786 msleep(10);
> 1787 ret = restart_syscall(); <=
> 1788 goto out_free;
> 1789 }
> 1790 cgroup_put(>root->cgrp);
> 1791 }
> 
> with ss->root->cgrp.self.refct.percpu_count_ptr == __PERCPU_REF_ATOMIC_DEAD
> 
> Any ideas?

The problem is that the memory cgroup controller takes a css reference
per each charged page and does not reparent charged pages on css
offline, while cgroup_mount/cgroup_kill_sb expect all css references to
offline cgroups to be gone soon, restarting the syscall if the ref count
!= 0. As a result, if you create a memory cgroup, charge some page cache
to it, and then remove it, unmount/mount will hang forever.

May be, we should kill the ref counter to the memory controller root in
cgroup_kill_sb only if there is no children at all, neither online nor
offline.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-10 Thread Vladimir Davydov
On Fri, Jan 09, 2015 at 05:43:17PM +, Suzuki K. Poulose wrote:
 Hi
 
 We have hit a hang on ARM64 defconfig, while running LTP tests on
 3.19-rc3. We are
 in the process of a git bisect and will update the results as and
 when we find the commit.
 
 During the ksm ltp run, the test hangs trying to mount memcg with
 the following strace
 output:
 
 mount(memcg, /dev/cgroup, cgroup, 0, memory) = ?
 ERESTARTNOINTR (To be restarted)
 mount(memcg, /dev/cgroup, cgroup, 0, memory) = ?
 ERESTARTNOINTR (To be restarted)
 [ ... repeated forever ... ]
 
 At this point, one can try mounting the memcg to verify the problem.
 # mount -t cgroup -o memory memcg memcg_dir
 --hangs--
 
 Strangely, if we run the mount command from a cold boot (i.e.
 without running LTP first),
 then it succeeds.
 
 Upon a quick look we are hitting the following code :
 kernel/cgroup.c: cgroup_mount() :
 
 1779 for_each_subsys(ss, i) {
 1780 if (!(opts.subsys_mask  (1  i)) ||
 1781 ss-root == cgrp_dfl_root)
 1782 continue;
 1783
 1784 if
 (!percpu_ref_tryget_live(ss-root-cgrp.self.refcnt)) {
 1785 mutex_unlock(cgroup_mutex);
 1786 msleep(10);
 1787 ret = restart_syscall(); =
 1788 goto out_free;
 1789 }
 1790 cgroup_put(ss-root-cgrp);
 1791 }
 
 with ss-root-cgrp.self.refct.percpu_count_ptr == __PERCPU_REF_ATOMIC_DEAD
 
 Any ideas?

The problem is that the memory cgroup controller takes a css reference
per each charged page and does not reparent charged pages on css
offline, while cgroup_mount/cgroup_kill_sb expect all css references to
offline cgroups to be gone soon, restarting the syscall if the ref count
!= 0. As a result, if you create a memory cgroup, charge some page cache
to it, and then remove it, unmount/mount will hang forever.

May be, we should kill the ref counter to the memory controller root in
cgroup_kill_sb only if there is no children at all, neither online nor
offline.

Thanks,
Vladimir
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-09 Thread Tejun Heo
On Fri, Jan 09, 2015 at 05:43:17PM +, Suzuki K. Poulose wrote:
> We have hit a hang on ARM64 defconfig, while running LTP tests on 3.19-rc3.
> We are
> in the process of a git bisect and will update the results as and
> when we find the commit.
> 
> During the ksm ltp run, the test hangs trying to mount memcg with the
> following strace
> output:
> 
> mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ? ERESTARTNOINTR (To
> be restarted)
> mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ? ERESTARTNOINTR (To
> be restarted)
> [ ... repeated forever ... ]
> 
> At this point, one can try mounting the memcg to verify the problem.
> # mount -t cgroup -o memory memcg memcg_dir
> --hangs--
> 
> Strangely, if we run the mount command from a cold boot (i.e. without
> running LTP first),
> then it succeeds.

I don't know what LTP is doing and this could actually be hitting on
an actual bug but if it's trying to move memcg back from unified
hierarchy to an old one, that might hang - it should prolly made to
just fail at that point.  Anyways, any chance you can find out what
happened, in terms of cgroup mounting, to memcg upto that point?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-09 Thread Suzuki K. Poulose

Hi

We have hit a hang on ARM64 defconfig, while running LTP tests on 
3.19-rc3. We are

in the process of a git bisect and will update the results as and
when we find the commit.

During the ksm ltp run, the test hangs trying to mount memcg with the 
following strace

output:

mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ? ERESTARTNOINTR 
(To be restarted)
mount("memcg", "/dev/cgroup", "cgroup", 0, "memory") = ? ERESTARTNOINTR 
(To be restarted)

[ ... repeated forever ... ]

At this point, one can try mounting the memcg to verify the problem.
# mount -t cgroup -o memory memcg memcg_dir
--hangs--

Strangely, if we run the mount command from a cold boot (i.e. without 
running LTP first),

then it succeeds.

Upon a quick look we are hitting the following code :
kernel/cgroup.c: cgroup_mount() :

1779 for_each_subsys(ss, i) {
1780 if (!(opts.subsys_mask & (1 << i)) ||
1781 ss->root == _dfl_root)
1782 continue;
1783
1784 if 
(!percpu_ref_tryget_live(>root->cgrp.self.refcnt)) {

1785 mutex_unlock(_mutex);
1786 msleep(10);
1787 ret = restart_syscall(); <=
1788 goto out_free;
1789 }
1790 cgroup_put(>root->cgrp);
1791 }

with ss->root->cgrp.self.refct.percpu_count_ptr == __PERCPU_REF_ATOMIC_DEAD

Any ideas?

Thanks
Suzuki

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-09 Thread Suzuki K. Poulose

Hi

We have hit a hang on ARM64 defconfig, while running LTP tests on 
3.19-rc3. We are

in the process of a git bisect and will update the results as and
when we find the commit.

During the ksm ltp run, the test hangs trying to mount memcg with the 
following strace

output:

mount(memcg, /dev/cgroup, cgroup, 0, memory) = ? ERESTARTNOINTR 
(To be restarted)
mount(memcg, /dev/cgroup, cgroup, 0, memory) = ? ERESTARTNOINTR 
(To be restarted)

[ ... repeated forever ... ]

At this point, one can try mounting the memcg to verify the problem.
# mount -t cgroup -o memory memcg memcg_dir
--hangs--

Strangely, if we run the mount command from a cold boot (i.e. without 
running LTP first),

then it succeeds.

Upon a quick look we are hitting the following code :
kernel/cgroup.c: cgroup_mount() :

1779 for_each_subsys(ss, i) {
1780 if (!(opts.subsys_mask  (1  i)) ||
1781 ss-root == cgrp_dfl_root)
1782 continue;
1783
1784 if 
(!percpu_ref_tryget_live(ss-root-cgrp.self.refcnt)) {

1785 mutex_unlock(cgroup_mutex);
1786 msleep(10);
1787 ret = restart_syscall(); =
1788 goto out_free;
1789 }
1790 cgroup_put(ss-root-cgrp);
1791 }

with ss-root-cgrp.self.refct.percpu_count_ptr == __PERCPU_REF_ATOMIC_DEAD

Any ideas?

Thanks
Suzuki

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] 3.19-rc3 : memcg: Hang in mount memcg

2015-01-09 Thread Tejun Heo
On Fri, Jan 09, 2015 at 05:43:17PM +, Suzuki K. Poulose wrote:
 We have hit a hang on ARM64 defconfig, while running LTP tests on 3.19-rc3.
 We are
 in the process of a git bisect and will update the results as and
 when we find the commit.
 
 During the ksm ltp run, the test hangs trying to mount memcg with the
 following strace
 output:
 
 mount(memcg, /dev/cgroup, cgroup, 0, memory) = ? ERESTARTNOINTR (To
 be restarted)
 mount(memcg, /dev/cgroup, cgroup, 0, memory) = ? ERESTARTNOINTR (To
 be restarted)
 [ ... repeated forever ... ]
 
 At this point, one can try mounting the memcg to verify the problem.
 # mount -t cgroup -o memory memcg memcg_dir
 --hangs--
 
 Strangely, if we run the mount command from a cold boot (i.e. without
 running LTP first),
 then it succeeds.

I don't know what LTP is doing and this could actually be hitting on
an actual bug but if it's trying to move memcg back from unified
hierarchy to an old one, that might hang - it should prolly made to
just fail at that point.  Anyways, any chance you can find out what
happened, in terms of cgroup mounting, to memcg upto that point?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/