Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-22 Thread Jiri Slaby
On 09/16/2012 09:07 PM, Hugh Dickins wrote:
>> What was the way that
>> Hugh used to reproduce the other issue?
> 
> I've lost track of which issue is "other".

The other was meant to be the BUG I hit.

> To reproduce Sasha's interval_tree.c warnings, all I had to do was switch
> on CONFIG_DEBUG_VM_RB (I regret not having done so before) and boot up.
> 
> I didn't look to see what was doing the mremap which caused the warning
> until now: surprisingly, it's microcode_ctl.  I've not made much effort
> to get the right set of sources and work out why that would be using
> mremap (a realloc inside a library?).
> 
> I failed to reproduce your BUG in huge_memory.c, but what I was trying
> was SuSE update via yast2, on several machines; but perhaps because
> they were all fairly close to up-to-date, I didn't hit a problem.
> (That was before I turned on DEBUG_VM_RB for Sasha's.)

The good news are that I cannot reproduce either with the patch applied.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-22 Thread Jiri Slaby
On 09/16/2012 09:07 PM, Hugh Dickins wrote:
 What was the way that
 Hugh used to reproduce the other issue?
 
 I've lost track of which issue is other.

The other was meant to be the BUG I hit.

 To reproduce Sasha's interval_tree.c warnings, all I had to do was switch
 on CONFIG_DEBUG_VM_RB (I regret not having done so before) and boot up.
 
 I didn't look to see what was doing the mremap which caused the warning
 until now: surprisingly, it's microcode_ctl.  I've not made much effort
 to get the right set of sources and work out why that would be using
 mremap (a realloc inside a library?).
 
 I failed to reproduce your BUG in huge_memory.c, but what I was trying
 was SuSE update via yast2, on several machines; but perhaps because
 they were all fairly close to up-to-date, I didn't hit a problem.
 (That was before I turned on DEBUG_VM_RB for Sasha's.)

The good news are that I cannot reproduce either with the patch applied.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-20 Thread Fengguang Wu
On Thu, Sep 20, 2012 at 03:27:11PM -0700, Hugh Dickins wrote:
> On Fri, 21 Sep 2012, Fengguang Wu wrote:
> > On Sat, Sep 15, 2012 at 11:26:23AM +0200, Sasha Levin wrote:
> > > On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
> > > > All right. Hugh managed to reproduce the issue on his suse laptop, and
> > > > I came up with a fix.
> > > >
> > > > The problem was that in mremap, the new vma's vm_{start,end,pgoff}
> > > > fields need to be updated before calling anon_vma_clone() so that the
> > > > new vma will be properly indexed.
> > > >
> > > > Patch attached. I expect this should also explain Jiri's reported
> > > > failure involving splitting THP pages during mremap(), even though we
> > > > did not manage to reproduce that one.
> > >
> > > Initially I've stumbled on it by running trinity inside a KVM tools 
> > > guest. fwiw,
> > > the guest is pretty custom and isn't based on suse.
> > >
> > > I re-ran tests with patch applied and looks like it fixed the issue, I 
> > > haven't
> > > seen the warnings even though it runs for quite a while now.
> > 
> > Not sure if it's the same problem you are talking about, but I got the
> > below warning and it's still happening in linux-next 20120920:
> 
> It is (almost certainly) the same problem, for which Michel provided
> the fix earlier in this thread (some of us find we have to delete a
> " {" from the context at the end to get it to apply).
> 
> That fix has gone into akpm's tree, but linux-next is still using an
> older rollup of akpm's tree.

Got it, thank you for the quick information!

Thanks,
Fengguang

> > [   38.482925] scsi_nl_rcv_msg: discarding partial skb
> > [   62.679879] [ cut here ]
> > [   62.680380] WARNING: at /c/kernel-tests/src/linux/mm/interval_tree.c:109 
> > anon_vma_interval_tree_verify+0x33/0x80()
> > [   62.681356] Pid: 195, comm: trinity-child0 Not tainted 
> > 3.6.0-rc6-next-20120918-08732-g3de9d1a #1
> > [   62.682130] Call Trace:
> > [   62.682356]  [] ? 
> > anon_vma_interval_tree_verify+0x33/0x80
> > [   62.682968]  [] warn_slowpath_common+0x5d/0x74
> > [   62.683577]  [] warn_slowpath_null+0x15/0x19
> > [   62.684098]  [] anon_vma_interval_tree_verify+0x33/0x80
> > [   62.684714]  [] validate_mm+0x32/0x15b
> > [   62.685202]  [] vma_link+0x95/0xa4
> > [   62.685637]  [] copy_vma+0x1c7/0x1fe
> > [   62.686168]  [] move_vma+0x90/0x1ef
> > [   62.686614]  [] sys_mremap+0x3a1/0x429
> > [   62.687094]  [] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [   62.687670]  [] system_call_fastpath+0x16/0x1b
> > 
> > Bisected down to 
> > 
> > commit cb58d445d2ec3a06f313e29d6f6af5bef6c9e43c
> > Author: Michel Lespinasse 
> > Date:   Thu Sep 13 10:58:56 2012 +1000
> > 
> > mm: add CONFIG_DEBUG_VM_RB build option
> > 
> > Thanks,
> > Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-20 Thread Hugh Dickins
On Fri, 21 Sep 2012, Fengguang Wu wrote:
> On Sat, Sep 15, 2012 at 11:26:23AM +0200, Sasha Levin wrote:
> > On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
> > > All right. Hugh managed to reproduce the issue on his suse laptop, and
> > > I came up with a fix.
> > >
> > > The problem was that in mremap, the new vma's vm_{start,end,pgoff}
> > > fields need to be updated before calling anon_vma_clone() so that the
> > > new vma will be properly indexed.
> > >
> > > Patch attached. I expect this should also explain Jiri's reported
> > > failure involving splitting THP pages during mremap(), even though we
> > > did not manage to reproduce that one.
> >
> > Initially I've stumbled on it by running trinity inside a KVM tools guest. 
> > fwiw,
> > the guest is pretty custom and isn't based on suse.
> >
> > I re-ran tests with patch applied and looks like it fixed the issue, I 
> > haven't
> > seen the warnings even though it runs for quite a while now.
> 
> Not sure if it's the same problem you are talking about, but I got the
> below warning and it's still happening in linux-next 20120920:

It is (almost certainly) the same problem, for which Michel provided
the fix earlier in this thread (some of us find we have to delete a
" {" from the context at the end to get it to apply).

That fix has gone into akpm's tree, but linux-next is still using an
older rollup of akpm's tree.

Thanks,
Hugh

> 
> [   38.482925] scsi_nl_rcv_msg: discarding partial skb
> [   62.679879] [ cut here ]
> [   62.680380] WARNING: at /c/kernel-tests/src/linux/mm/interval_tree.c:109 
> anon_vma_interval_tree_verify+0x33/0x80()
> [   62.681356] Pid: 195, comm: trinity-child0 Not tainted 
> 3.6.0-rc6-next-20120918-08732-g3de9d1a #1
> [   62.682130] Call Trace:
> [   62.682356]  [] ? anon_vma_interval_tree_verify+0x33/0x80
> [   62.682968]  [] warn_slowpath_common+0x5d/0x74
> [   62.683577]  [] warn_slowpath_null+0x15/0x19
> [   62.684098]  [] anon_vma_interval_tree_verify+0x33/0x80
> [   62.684714]  [] validate_mm+0x32/0x15b
> [   62.685202]  [] vma_link+0x95/0xa4
> [   62.685637]  [] copy_vma+0x1c7/0x1fe
> [   62.686168]  [] move_vma+0x90/0x1ef
> [   62.686614]  [] sys_mremap+0x3a1/0x429
> [   62.687094]  [] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [   62.687670]  [] system_call_fastpath+0x16/0x1b
> 
> Bisected down to 
> 
> commit cb58d445d2ec3a06f313e29d6f6af5bef6c9e43c
> Author: Michel Lespinasse 
> Date:   Thu Sep 13 10:58:56 2012 +1000
> 
> mm: add CONFIG_DEBUG_VM_RB build option
> 
> Thanks,
> Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-20 Thread Hugh Dickins
On Fri, 21 Sep 2012, Fengguang Wu wrote:
 On Sat, Sep 15, 2012 at 11:26:23AM +0200, Sasha Levin wrote:
  On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
   All right. Hugh managed to reproduce the issue on his suse laptop, and
   I came up with a fix.
  
   The problem was that in mremap, the new vma's vm_{start,end,pgoff}
   fields need to be updated before calling anon_vma_clone() so that the
   new vma will be properly indexed.
  
   Patch attached. I expect this should also explain Jiri's reported
   failure involving splitting THP pages during mremap(), even though we
   did not manage to reproduce that one.
 
  Initially I've stumbled on it by running trinity inside a KVM tools guest. 
  fwiw,
  the guest is pretty custom and isn't based on suse.
 
  I re-ran tests with patch applied and looks like it fixed the issue, I 
  haven't
  seen the warnings even though it runs for quite a while now.
 
 Not sure if it's the same problem you are talking about, but I got the
 below warning and it's still happening in linux-next 20120920:

It is (almost certainly) the same problem, for which Michel provided
the fix earlier in this thread (some of us find we have to delete a
 { from the context at the end to get it to apply).

That fix has gone into akpm's tree, but linux-next is still using an
older rollup of akpm's tree.

Thanks,
Hugh

 
 [   38.482925] scsi_nl_rcv_msg: discarding partial skb
 [   62.679879] [ cut here ]
 [   62.680380] WARNING: at /c/kernel-tests/src/linux/mm/interval_tree.c:109 
 anon_vma_interval_tree_verify+0x33/0x80()
 [   62.681356] Pid: 195, comm: trinity-child0 Not tainted 
 3.6.0-rc6-next-20120918-08732-g3de9d1a #1
 [   62.682130] Call Trace:
 [   62.682356]  [810c249f] ? anon_vma_interval_tree_verify+0x33/0x80
 [   62.682968]  [81044356] warn_slowpath_common+0x5d/0x74
 [   62.683577]  [81044424] warn_slowpath_null+0x15/0x19
 [   62.684098]  [810c249f] anon_vma_interval_tree_verify+0x33/0x80
 [   62.684714]  [810ca57c] validate_mm+0x32/0x15b
 [   62.685202]  [810ca767] vma_link+0x95/0xa4
 [   62.685637]  [810cbc31] copy_vma+0x1c7/0x1fe
 [   62.686168]  [810cdd50] move_vma+0x90/0x1ef
 [   62.686614]  [810ce250] sys_mremap+0x3a1/0x429
 [   62.687094]  [813caafe] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [   62.687670]  [81b505b9] system_call_fastpath+0x16/0x1b
 
 Bisected down to 
 
 commit cb58d445d2ec3a06f313e29d6f6af5bef6c9e43c
 Author: Michel Lespinasse wal...@google.com
 Date:   Thu Sep 13 10:58:56 2012 +1000
 
 mm: add CONFIG_DEBUG_VM_RB build option
 
 Thanks,
 Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-20 Thread Fengguang Wu
On Thu, Sep 20, 2012 at 03:27:11PM -0700, Hugh Dickins wrote:
 On Fri, 21 Sep 2012, Fengguang Wu wrote:
  On Sat, Sep 15, 2012 at 11:26:23AM +0200, Sasha Levin wrote:
   On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
All right. Hugh managed to reproduce the issue on his suse laptop, and
I came up with a fix.
   
The problem was that in mremap, the new vma's vm_{start,end,pgoff}
fields need to be updated before calling anon_vma_clone() so that the
new vma will be properly indexed.
   
Patch attached. I expect this should also explain Jiri's reported
failure involving splitting THP pages during mremap(), even though we
did not manage to reproduce that one.
  
   Initially I've stumbled on it by running trinity inside a KVM tools 
   guest. fwiw,
   the guest is pretty custom and isn't based on suse.
  
   I re-ran tests with patch applied and looks like it fixed the issue, I 
   haven't
   seen the warnings even though it runs for quite a while now.
  
  Not sure if it's the same problem you are talking about, but I got the
  below warning and it's still happening in linux-next 20120920:
 
 It is (almost certainly) the same problem, for which Michel provided
 the fix earlier in this thread (some of us find we have to delete a
  { from the context at the end to get it to apply).
 
 That fix has gone into akpm's tree, but linux-next is still using an
 older rollup of akpm's tree.

Got it, thank you for the quick information!

Thanks,
Fengguang

  [   38.482925] scsi_nl_rcv_msg: discarding partial skb
  [   62.679879] [ cut here ]
  [   62.680380] WARNING: at /c/kernel-tests/src/linux/mm/interval_tree.c:109 
  anon_vma_interval_tree_verify+0x33/0x80()
  [   62.681356] Pid: 195, comm: trinity-child0 Not tainted 
  3.6.0-rc6-next-20120918-08732-g3de9d1a #1
  [   62.682130] Call Trace:
  [   62.682356]  [810c249f] ? 
  anon_vma_interval_tree_verify+0x33/0x80
  [   62.682968]  [81044356] warn_slowpath_common+0x5d/0x74
  [   62.683577]  [81044424] warn_slowpath_null+0x15/0x19
  [   62.684098]  [810c249f] anon_vma_interval_tree_verify+0x33/0x80
  [   62.684714]  [810ca57c] validate_mm+0x32/0x15b
  [   62.685202]  [810ca767] vma_link+0x95/0xa4
  [   62.685637]  [810cbc31] copy_vma+0x1c7/0x1fe
  [   62.686168]  [810cdd50] move_vma+0x90/0x1ef
  [   62.686614]  [810ce250] sys_mremap+0x3a1/0x429
  [   62.687094]  [813caafe] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [   62.687670]  [81b505b9] system_call_fastpath+0x16/0x1b
  
  Bisected down to 
  
  commit cb58d445d2ec3a06f313e29d6f6af5bef6c9e43c
  Author: Michel Lespinasse wal...@google.com
  Date:   Thu Sep 13 10:58:56 2012 +1000
  
  mm: add CONFIG_DEBUG_VM_RB build option
  
  Thanks,
  Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-16 Thread Hugh Dickins
On Sat, 15 Sep 2012, Jiri Slaby wrote:
> On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
> > All right. Hugh managed to reproduce the issue on his suse laptop, and
> > I came up with a fix.
> > 
> > The problem was that in mremap, the new vma's vm_{start,end,pgoff}
> > fields need to be updated before calling anon_vma_clone() so that the
> > new vma will be properly indexed.
> > 
> > Patch attached. I expect this should also explain Jiri's reported
> > failure involving splitting THP pages during mremap(), even though we
> > did not manage to reproduce that one.
> 
> Oh, great. This is BTW also machine with suse.

We guessed that for you it might be :)
I've not yet moved up from 11.4 by the way, if that makes a difference.

In fact, even before these reports, when Michel was wondering about the
uses of mremap, I did mention an mremap/THP bug from a year ago, which
the SuSE update had been good for reproducing.

> What was the way that
> Hugh used to reproduce the other issue?

I've lost track of which issue is "other".

To reproduce Sasha's interval_tree.c warnings, all I had to do was switch
on CONFIG_DEBUG_VM_RB (I regret not having done so before) and boot up.

I didn't look to see what was doing the mremap which caused the warning
until now: surprisingly, it's microcode_ctl.  I've not made much effort
to get the right set of sources and work out why that would be using
mremap (a realloc inside a library?).

I failed to reproduce your BUG in huge_memory.c, but what I was trying
was SuSE update via yast2, on several machines; but perhaps because
they were all fairly close to up-to-date, I didn't hit a problem.
(That was before I turned on DEBUG_VM_RB for Sasha's.)

Hugh

> For me it happened twice in a
> row when using zypper to upgrade packages. But it did not happen any
> more after that.
> 
> thanks,
> -- 
> js
> suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-16 Thread Hugh Dickins
On Sat, 15 Sep 2012, Jiri Slaby wrote:
 On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
  All right. Hugh managed to reproduce the issue on his suse laptop, and
  I came up with a fix.
  
  The problem was that in mremap, the new vma's vm_{start,end,pgoff}
  fields need to be updated before calling anon_vma_clone() so that the
  new vma will be properly indexed.
  
  Patch attached. I expect this should also explain Jiri's reported
  failure involving splitting THP pages during mremap(), even though we
  did not manage to reproduce that one.
 
 Oh, great. This is BTW also machine with suse.

We guessed that for you it might be :)
I've not yet moved up from 11.4 by the way, if that makes a difference.

In fact, even before these reports, when Michel was wondering about the
uses of mremap, I did mention an mremap/THP bug from a year ago, which
the SuSE update had been good for reproducing.

 What was the way that
 Hugh used to reproduce the other issue?

I've lost track of which issue is other.

To reproduce Sasha's interval_tree.c warnings, all I had to do was switch
on CONFIG_DEBUG_VM_RB (I regret not having done so before) and boot up.

I didn't look to see what was doing the mremap which caused the warning
until now: surprisingly, it's microcode_ctl.  I've not made much effort
to get the right set of sources and work out why that would be using
mremap (a realloc inside a library?).

I failed to reproduce your BUG in huge_memory.c, but what I was trying
was SuSE update via yast2, on several machines; but perhaps because
they were all fairly close to up-to-date, I didn't hit a problem.
(That was before I turned on DEBUG_VM_RB for Sasha's.)

Hugh

 For me it happened twice in a
 row when using zypper to upgrade packages. But it did not happen any
 more after that.
 
 thanks,
 -- 
 js
 suse labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-15 Thread Sasha Levin
On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
> All right. Hugh managed to reproduce the issue on his suse laptop, and
> I came up with a fix.
> 
> The problem was that in mremap, the new vma's vm_{start,end,pgoff}
> fields need to be updated before calling anon_vma_clone() so that the
> new vma will be properly indexed.
> 
> Patch attached. I expect this should also explain Jiri's reported
> failure involving splitting THP pages during mremap(), even though we
> did not manage to reproduce that one.

Initially I've stumbled on it by running trinity inside a KVM tools guest. fwiw,
the guest is pretty custom and isn't based on suse.

I re-ran tests with patch applied and looks like it fixed the issue, I haven't
seen the warnings even though it runs for quite a while now.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-15 Thread Jiri Slaby
On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
> All right. Hugh managed to reproduce the issue on his suse laptop, and
> I came up with a fix.
> 
> The problem was that in mremap, the new vma's vm_{start,end,pgoff}
> fields need to be updated before calling anon_vma_clone() so that the
> new vma will be properly indexed.
> 
> Patch attached. I expect this should also explain Jiri's reported
> failure involving splitting THP pages during mremap(), even though we
> did not manage to reproduce that one.

Oh, great. This is BTW also machine with suse. What was the way that
Hugh used to reproduce the other issue? For me it happened twice in a
row when using zypper to upgrade packages. But it did not happen any
more after that.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-15 Thread Jiri Slaby
On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
 All right. Hugh managed to reproduce the issue on his suse laptop, and
 I came up with a fix.
 
 The problem was that in mremap, the new vma's vm_{start,end,pgoff}
 fields need to be updated before calling anon_vma_clone() so that the
 new vma will be properly indexed.
 
 Patch attached. I expect this should also explain Jiri's reported
 failure involving splitting THP pages during mremap(), even though we
 did not manage to reproduce that one.

Oh, great. This is BTW also machine with suse. What was the way that
Hugh used to reproduce the other issue? For me it happened twice in a
row when using zypper to upgrade packages. But it did not happen any
more after that.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-15 Thread Sasha Levin
On 09/15/2012 02:00 AM, Michel Lespinasse wrote:
 All right. Hugh managed to reproduce the issue on his suse laptop, and
 I came up with a fix.
 
 The problem was that in mremap, the new vma's vm_{start,end,pgoff}
 fields need to be updated before calling anon_vma_clone() so that the
 new vma will be properly indexed.
 
 Patch attached. I expect this should also explain Jiri's reported
 failure involving splitting THP pages during mremap(), even though we
 did not manage to reproduce that one.

Initially I've stumbled on it by running trinity inside a KVM tools guest. fwiw,
the guest is pretty custom and isn't based on suse.

I re-ran tests with patch applied and looks like it fixed the issue, I haven't
seen the warnings even though it runs for quite a while now.


Thanks,
Sasha
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-14 Thread Michel Lespinasse
On Fri, Sep 14, 2012 at 3:46 PM, Michel Lespinasse  wrote:
> On Fri, Sep 14, 2012 at 3:14 PM, Sasha Levin  wrote:
>> On 09/04/2012 11:20 AM, Michel Lespinasse wrote:
>>> Add a CONFIG_DEBUG_VM_RB build option for the previously existing
>>> DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
>>> recursive algorithms, we can expose it a bit more.
>>>
>>> Also extend this code to validate_mm() after stack expansion, and to
>>> check that the vma's start and last pgoffs have not changed since the
>>> nodes were inserted on the anon vma interval tree (as it is important
>>> that the nodes be reindexed after each such update).
>>
>> This patch exposes the following warning:
>>
>> [   24.977502] [ cut here ]
>> [   24.979089] WARNING: at mm/interval_tree.c:110
>> anon_vma_interval_tree_verify+0x81/0xa0()
>> [   24.981765] Pid: 5928, comm: trinity-child37 Tainted: GW
>> 3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #333
>> [   24.985501] Call Trace:
>> [   24.986345]  [] ? 
>> anon_vma_interval_tree_verify+0x81/0xa0
>> [   24.988535]  [] warn_slowpath_common+0x86/0xb0
>> [   24.990636]  [] warn_slowpath_null+0x15/0x20
>> [   24.992658]  [] anon_vma_interval_tree_verify+0x81/0xa0
>> [   24.994980]  [] validate_mm+0x58/0x1e0
>> [   24.996772]  [] vma_link+0x94/0xe0
>> [   24.997719]  [] copy_vma+0x279/0x2e0
>> [   24.998522]  [] ? trace_hardirqs_off+0xd/0x10
>> [   25.000772]  [] move_vma+0xa9/0x260
>> [   25.002499]  [] sys_mremap+0x475/0x540
>> [   25.004364]  [] tracesys+0xe1/0xe6
>> [   25.006108] ---[ end trace 7c901670963aa6e2 ]---
>>
>> The code line is
>>
>> WARN_ON_ONCE(node->cached_vma_last != avc_last_pgoff(node));
>
> That's very interesting (and potentially relevant to another bug
> that's been reported too).
>
> I'd like to know, what workload did you use that triggered this ?
> (I find it hard to test mremap as I don't know of enough users of it)

All right. Hugh managed to reproduce the issue on his suse laptop, and
I came up with a fix.

The problem was that in mremap, the new vma's vm_{start,end,pgoff}
fields need to be updated before calling anon_vma_clone() so that the
new vma will be properly indexed.

Patch attached. I expect this should also explain Jiri's reported
failure involving splitting THP pages during mremap(), even though we
did not manage to reproduce that one.

-8<---

From: Michel Lespinasse 
Date: Fri, 14 Sep 2012 16:43:49 -0700
Subject: [PATCH] mm anon rmap: in mremap, set the new vma's position before
 anon_vma_clone()

anon_vma_clone() expects new_vma->vm_{start,end,pgoff} to be correctly set
so that the new vma can be indexed on the anon interval tree.

copy_vma() was failing to do that, which broke mremap().

Signed-off-by: Michel Lespinasse 

---
 mm/mmap.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index cc8c64077a42..7e672800b5d4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2446,16 +2446,16 @@ struct vm_area_struct *copy_vma(struct vm_area_struct 
**vmap,
new_vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (new_vma) {
*new_vma = *vma;
+   new_vma->vm_start = addr;
+   new_vma->vm_end = addr + len;
+   new_vma->vm_pgoff = pgoff;
pol = mpol_dup(vma_policy(vma));
if (IS_ERR(pol))
goto out_free_vma;
+   vma_set_policy(new_vma, pol);
INIT_LIST_HEAD(_vma->anon_vma_chain);
if (anon_vma_clone(new_vma, vma))
goto out_free_mempol;
-   vma_set_policy(new_vma, pol);
-   new_vma->vm_start = addr;
-   new_vma->vm_end = addr + len;
-   new_vma->vm_pgoff = pgoff;
if (new_vma->vm_file) {
get_file(new_vma->vm_file);
 
-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-14 Thread Michel Lespinasse
On Fri, Sep 14, 2012 at 3:14 PM, Sasha Levin  wrote:
> On 09/04/2012 11:20 AM, Michel Lespinasse wrote:
>> Add a CONFIG_DEBUG_VM_RB build option for the previously existing
>> DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
>> recursive algorithms, we can expose it a bit more.
>>
>> Also extend this code to validate_mm() after stack expansion, and to
>> check that the vma's start and last pgoffs have not changed since the
>> nodes were inserted on the anon vma interval tree (as it is important
>> that the nodes be reindexed after each such update).
>
> This patch exposes the following warning:
>
> [   24.977502] [ cut here ]
> [   24.979089] WARNING: at mm/interval_tree.c:110
> anon_vma_interval_tree_verify+0x81/0xa0()
> [   24.981765] Pid: 5928, comm: trinity-child37 Tainted: GW
> 3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #333
> [   24.985501] Call Trace:
> [   24.986345]  [] ? anon_vma_interval_tree_verify+0x81/0xa0
> [   24.988535]  [] warn_slowpath_common+0x86/0xb0
> [   24.990636]  [] warn_slowpath_null+0x15/0x20
> [   24.992658]  [] anon_vma_interval_tree_verify+0x81/0xa0
> [   24.994980]  [] validate_mm+0x58/0x1e0
> [   24.996772]  [] vma_link+0x94/0xe0
> [   24.997719]  [] copy_vma+0x279/0x2e0
> [   24.998522]  [] ? trace_hardirqs_off+0xd/0x10
> [   25.000772]  [] move_vma+0xa9/0x260
> [   25.002499]  [] sys_mremap+0x475/0x540
> [   25.004364]  [] tracesys+0xe1/0xe6
> [   25.006108] ---[ end trace 7c901670963aa6e2 ]---
>
> The code line is
>
> WARN_ON_ONCE(node->cached_vma_last != avc_last_pgoff(node));

That's very interesting (and potentially relevant to another bug
that's been reported too).

I'd like to know, what workload did you use that triggered this ?
(I find it hard to test mremap as I don't know of enough users of it)

Thanks,

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-14 Thread Sasha Levin
On 09/15/2012 12:14 AM, Sasha Levin wrote:
> On 09/04/2012 11:20 AM, Michel Lespinasse wrote:
>> Add a CONFIG_DEBUG_VM_RB build option for the previously existing
>> DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
>> recursive algorithms, we can expose it a bit more.
>>
>> Also extend this code to validate_mm() after stack expansion, and to
>> check that the vma's start and last pgoffs have not changed since the
>> nodes were inserted on the anon vma interval tree (as it is important
>> that the nodes be reindexed after each such update).
> 
> This patch exposes the following warning:
> 
> [   24.977502] [ cut here ]
> [   24.979089] WARNING: at mm/interval_tree.c:110
> anon_vma_interval_tree_verify+0x81/0xa0()
> [   24.981765] Pid: 5928, comm: trinity-child37 Tainted: GW
> 3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #333
> [   24.985501] Call Trace:
> [   24.986345]  [] ? anon_vma_interval_tree_verify+0x81/0xa0
> [   24.988535]  [] warn_slowpath_common+0x86/0xb0
> [   24.990636]  [] warn_slowpath_null+0x15/0x20
> [   24.992658]  [] anon_vma_interval_tree_verify+0x81/0xa0
> [   24.994980]  [] validate_mm+0x58/0x1e0
> [   24.996772]  [] vma_link+0x94/0xe0
> [   24.997719]  [] copy_vma+0x279/0x2e0
> [   24.998522]  [] ? trace_hardirqs_off+0xd/0x10
> [   25.000772]  [] move_vma+0xa9/0x260
> [   25.002499]  [] sys_mremap+0x475/0x540
> [   25.004364]  [] tracesys+0xe1/0xe6
> [   25.006108] ---[ end trace 7c901670963aa6e2 ]---
> 
> The code line is
> 
> WARN_ON_ONCE(node->cached_vma_last != avc_last_pgoff(node));
> 

The second WARN in the function also triggers once in a while:

[   18.360283] [ cut here ]
[   18.360289] WARNING: at mm/interval_tree.c:109
anon_vma_interval_tree_verify+0x36/0xa0()
[   18.360292] Pid: 5694, comm: trinity-child15 Tainted: GW
3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #335
[   18.360293] Call Trace:
[   18.360297]  [] ? anon_vma_interval_tree_verify+0x36/0xa0
[   18.360300]  [] warn_slowpath_common+0x86/0xb0
[   18.360303]  [] warn_slowpath_null+0x15/0x20
[   18.360305]  [] anon_vma_interval_tree_verify+0x36/0xa0
[   18.360309]  [] validate_mm+0x58/0x1e0
[   18.360312]  [] vma_link+0x94/0xe0
[   18.360315]  [] copy_vma+0x279/0x2e0
[   18.360319]  [] ? trace_hardirqs_off+0xd/0x10
[   18.360322]  [] move_vma+0xa9/0x260
[   18.360326]  [] sys_mremap+0x475/0x540
[   18.360330]  [] tracesys+0xe1/0xe6
[   18.360332] ---[ end trace de862a218d00cefd ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-14 Thread Sasha Levin
On 09/04/2012 11:20 AM, Michel Lespinasse wrote:
> Add a CONFIG_DEBUG_VM_RB build option for the previously existing
> DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
> recursive algorithms, we can expose it a bit more.
> 
> Also extend this code to validate_mm() after stack expansion, and to
> check that the vma's start and last pgoffs have not changed since the
> nodes were inserted on the anon vma interval tree (as it is important
> that the nodes be reindexed after each such update).

This patch exposes the following warning:

[   24.977502] [ cut here ]
[   24.979089] WARNING: at mm/interval_tree.c:110
anon_vma_interval_tree_verify+0x81/0xa0()
[   24.981765] Pid: 5928, comm: trinity-child37 Tainted: GW
3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #333
[   24.985501] Call Trace:
[   24.986345]  [] ? anon_vma_interval_tree_verify+0x81/0xa0
[   24.988535]  [] warn_slowpath_common+0x86/0xb0
[   24.990636]  [] warn_slowpath_null+0x15/0x20
[   24.992658]  [] anon_vma_interval_tree_verify+0x81/0xa0
[   24.994980]  [] validate_mm+0x58/0x1e0
[   24.996772]  [] vma_link+0x94/0xe0
[   24.997719]  [] copy_vma+0x279/0x2e0
[   24.998522]  [] ? trace_hardirqs_off+0xd/0x10
[   25.000772]  [] move_vma+0xa9/0x260
[   25.002499]  [] sys_mremap+0x475/0x540
[   25.004364]  [] tracesys+0xe1/0xe6
[   25.006108] ---[ end trace 7c901670963aa6e2 ]---

The code line is

WARN_ON_ONCE(node->cached_vma_last != avc_last_pgoff(node));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-14 Thread Sasha Levin
On 09/04/2012 11:20 AM, Michel Lespinasse wrote:
 Add a CONFIG_DEBUG_VM_RB build option for the previously existing
 DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
 recursive algorithms, we can expose it a bit more.
 
 Also extend this code to validate_mm() after stack expansion, and to
 check that the vma's start and last pgoffs have not changed since the
 nodes were inserted on the anon vma interval tree (as it is important
 that the nodes be reindexed after each such update).

This patch exposes the following warning:

[   24.977502] [ cut here ]
[   24.979089] WARNING: at mm/interval_tree.c:110
anon_vma_interval_tree_verify+0x81/0xa0()
[   24.981765] Pid: 5928, comm: trinity-child37 Tainted: GW
3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #333
[   24.985501] Call Trace:
[   24.986345]  [81224c91] ? anon_vma_interval_tree_verify+0x81/0xa0
[   24.988535]  [81106766] warn_slowpath_common+0x86/0xb0
[   24.990636]  [81106855] warn_slowpath_null+0x15/0x20
[   24.992658]  [81224c91] anon_vma_interval_tree_verify+0x81/0xa0
[   24.994980]  [8122e6e8] validate_mm+0x58/0x1e0
[   24.996772]  [8122e934] vma_link+0x94/0xe0
[   24.997719]  [812315e9] copy_vma+0x279/0x2e0
[   24.998522]  [8117a7fd] ? trace_hardirqs_off+0xd/0x10
[   25.000772]  [81232e89] move_vma+0xa9/0x260
[   25.002499]  [812334b5] sys_mremap+0x475/0x540
[   25.004364]  [8374b6e8] tracesys+0xe1/0xe6
[   25.006108] ---[ end trace 7c901670963aa6e2 ]---

The code line is

WARN_ON_ONCE(node-cached_vma_last != avc_last_pgoff(node));
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-14 Thread Sasha Levin
On 09/15/2012 12:14 AM, Sasha Levin wrote:
 On 09/04/2012 11:20 AM, Michel Lespinasse wrote:
 Add a CONFIG_DEBUG_VM_RB build option for the previously existing
 DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
 recursive algorithms, we can expose it a bit more.

 Also extend this code to validate_mm() after stack expansion, and to
 check that the vma's start and last pgoffs have not changed since the
 nodes were inserted on the anon vma interval tree (as it is important
 that the nodes be reindexed after each such update).
 
 This patch exposes the following warning:
 
 [   24.977502] [ cut here ]
 [   24.979089] WARNING: at mm/interval_tree.c:110
 anon_vma_interval_tree_verify+0x81/0xa0()
 [   24.981765] Pid: 5928, comm: trinity-child37 Tainted: GW
 3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #333
 [   24.985501] Call Trace:
 [   24.986345]  [81224c91] ? anon_vma_interval_tree_verify+0x81/0xa0
 [   24.988535]  [81106766] warn_slowpath_common+0x86/0xb0
 [   24.990636]  [81106855] warn_slowpath_null+0x15/0x20
 [   24.992658]  [81224c91] anon_vma_interval_tree_verify+0x81/0xa0
 [   24.994980]  [8122e6e8] validate_mm+0x58/0x1e0
 [   24.996772]  [8122e934] vma_link+0x94/0xe0
 [   24.997719]  [812315e9] copy_vma+0x279/0x2e0
 [   24.998522]  [8117a7fd] ? trace_hardirqs_off+0xd/0x10
 [   25.000772]  [81232e89] move_vma+0xa9/0x260
 [   25.002499]  [812334b5] sys_mremap+0x475/0x540
 [   25.004364]  [8374b6e8] tracesys+0xe1/0xe6
 [   25.006108] ---[ end trace 7c901670963aa6e2 ]---
 
 The code line is
 
 WARN_ON_ONCE(node-cached_vma_last != avc_last_pgoff(node));
 

The second WARN in the function also triggers once in a while:

[   18.360283] [ cut here ]
[   18.360289] WARNING: at mm/interval_tree.c:109
anon_vma_interval_tree_verify+0x36/0xa0()
[   18.360292] Pid: 5694, comm: trinity-child15 Tainted: GW
3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #335
[   18.360293] Call Trace:
[   18.360297]  [81224c26] ? anon_vma_interval_tree_verify+0x36/0xa0
[   18.360300]  [81106746] warn_slowpath_common+0x86/0xb0
[   18.360303]  [81106835] warn_slowpath_null+0x15/0x20
[   18.360305]  [81224c26] anon_vma_interval_tree_verify+0x36/0xa0
[   18.360309]  [8122e6c8] validate_mm+0x58/0x1e0
[   18.360312]  [8122e914] vma_link+0x94/0xe0
[   18.360315]  [812315c9] copy_vma+0x279/0x2e0
[   18.360319]  [8117a7dd] ? trace_hardirqs_off+0xd/0x10
[   18.360322]  [81232e69] move_vma+0xa9/0x260
[   18.360326]  [81233495] sys_mremap+0x475/0x540
[   18.360330]  [8374b6e8] tracesys+0xe1/0xe6
[   18.360332] ---[ end trace de862a218d00cefd ]---
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-14 Thread Michel Lespinasse
On Fri, Sep 14, 2012 at 3:14 PM, Sasha Levin levinsasha...@gmail.com wrote:
 On 09/04/2012 11:20 AM, Michel Lespinasse wrote:
 Add a CONFIG_DEBUG_VM_RB build option for the previously existing
 DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
 recursive algorithms, we can expose it a bit more.

 Also extend this code to validate_mm() after stack expansion, and to
 check that the vma's start and last pgoffs have not changed since the
 nodes were inserted on the anon vma interval tree (as it is important
 that the nodes be reindexed after each such update).

 This patch exposes the following warning:

 [   24.977502] [ cut here ]
 [   24.979089] WARNING: at mm/interval_tree.c:110
 anon_vma_interval_tree_verify+0x81/0xa0()
 [   24.981765] Pid: 5928, comm: trinity-child37 Tainted: GW
 3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #333
 [   24.985501] Call Trace:
 [   24.986345]  [81224c91] ? anon_vma_interval_tree_verify+0x81/0xa0
 [   24.988535]  [81106766] warn_slowpath_common+0x86/0xb0
 [   24.990636]  [81106855] warn_slowpath_null+0x15/0x20
 [   24.992658]  [81224c91] anon_vma_interval_tree_verify+0x81/0xa0
 [   24.994980]  [8122e6e8] validate_mm+0x58/0x1e0
 [   24.996772]  [8122e934] vma_link+0x94/0xe0
 [   24.997719]  [812315e9] copy_vma+0x279/0x2e0
 [   24.998522]  [8117a7fd] ? trace_hardirqs_off+0xd/0x10
 [   25.000772]  [81232e89] move_vma+0xa9/0x260
 [   25.002499]  [812334b5] sys_mremap+0x475/0x540
 [   25.004364]  [8374b6e8] tracesys+0xe1/0xe6
 [   25.006108] ---[ end trace 7c901670963aa6e2 ]---

 The code line is

 WARN_ON_ONCE(node-cached_vma_last != avc_last_pgoff(node));

That's very interesting (and potentially relevant to another bug
that's been reported too).

I'd like to know, what workload did you use that triggered this ?
(I find it hard to test mremap as I don't know of enough users of it)

Thanks,

-- 
Michel Walken Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-14 Thread Michel Lespinasse
On Fri, Sep 14, 2012 at 3:46 PM, Michel Lespinasse wal...@google.com wrote:
 On Fri, Sep 14, 2012 at 3:14 PM, Sasha Levin levinsasha...@gmail.com wrote:
 On 09/04/2012 11:20 AM, Michel Lespinasse wrote:
 Add a CONFIG_DEBUG_VM_RB build option for the previously existing
 DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
 recursive algorithms, we can expose it a bit more.

 Also extend this code to validate_mm() after stack expansion, and to
 check that the vma's start and last pgoffs have not changed since the
 nodes were inserted on the anon vma interval tree (as it is important
 that the nodes be reindexed after each such update).

 This patch exposes the following warning:

 [   24.977502] [ cut here ]
 [   24.979089] WARNING: at mm/interval_tree.c:110
 anon_vma_interval_tree_verify+0x81/0xa0()
 [   24.981765] Pid: 5928, comm: trinity-child37 Tainted: GW
 3.6.0-rc5-next-20120914-sasha-3-g7deb7fa-dirty #333
 [   24.985501] Call Trace:
 [   24.986345]  [81224c91] ? 
 anon_vma_interval_tree_verify+0x81/0xa0
 [   24.988535]  [81106766] warn_slowpath_common+0x86/0xb0
 [   24.990636]  [81106855] warn_slowpath_null+0x15/0x20
 [   24.992658]  [81224c91] anon_vma_interval_tree_verify+0x81/0xa0
 [   24.994980]  [8122e6e8] validate_mm+0x58/0x1e0
 [   24.996772]  [8122e934] vma_link+0x94/0xe0
 [   24.997719]  [812315e9] copy_vma+0x279/0x2e0
 [   24.998522]  [8117a7fd] ? trace_hardirqs_off+0xd/0x10
 [   25.000772]  [81232e89] move_vma+0xa9/0x260
 [   25.002499]  [812334b5] sys_mremap+0x475/0x540
 [   25.004364]  [8374b6e8] tracesys+0xe1/0xe6
 [   25.006108] ---[ end trace 7c901670963aa6e2 ]---

 The code line is

 WARN_ON_ONCE(node-cached_vma_last != avc_last_pgoff(node));

 That's very interesting (and potentially relevant to another bug
 that's been reported too).

 I'd like to know, what workload did you use that triggered this ?
 (I find it hard to test mremap as I don't know of enough users of it)

All right. Hugh managed to reproduce the issue on his suse laptop, and
I came up with a fix.

The problem was that in mremap, the new vma's vm_{start,end,pgoff}
fields need to be updated before calling anon_vma_clone() so that the
new vma will be properly indexed.

Patch attached. I expect this should also explain Jiri's reported
failure involving splitting THP pages during mremap(), even though we
did not manage to reproduce that one.

-8---

From: Michel Lespinasse wal...@google.com
Date: Fri, 14 Sep 2012 16:43:49 -0700
Subject: [PATCH] mm anon rmap: in mremap, set the new vma's position before
 anon_vma_clone()

anon_vma_clone() expects new_vma-vm_{start,end,pgoff} to be correctly set
so that the new vma can be indexed on the anon interval tree.

copy_vma() was failing to do that, which broke mremap().

Signed-off-by: Michel Lespinasse wal...@google.com

---
 mm/mmap.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index cc8c64077a42..7e672800b5d4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2446,16 +2446,16 @@ struct vm_area_struct *copy_vma(struct vm_area_struct 
**vmap,
new_vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
if (new_vma) {
*new_vma = *vma;
+   new_vma-vm_start = addr;
+   new_vma-vm_end = addr + len;
+   new_vma-vm_pgoff = pgoff;
pol = mpol_dup(vma_policy(vma));
if (IS_ERR(pol))
goto out_free_vma;
+   vma_set_policy(new_vma, pol);
INIT_LIST_HEAD(new_vma-anon_vma_chain);
if (anon_vma_clone(new_vma, vma))
goto out_free_mempol;
-   vma_set_policy(new_vma, pol);
-   new_vma-vm_start = addr;
-   new_vma-vm_end = addr + len;
-   new_vma-vm_pgoff = pgoff;
if (new_vma-vm_file) {
get_file(new_vma-vm_file);
 
-- 
Michel Walken Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-04 Thread Michel Lespinasse
Add a CONFIG_DEBUG_VM_RB build option for the previously existing
DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
recursive algorithms, we can expose it a bit more.

Also extend this code to validate_mm() after stack expansion, and to
check that the vma's start and last pgoffs have not changed since the
nodes were inserted on the anon vma interval tree (as it is important
that the nodes be reindexed after each such update).

Signed-off-by: Michel Lespinasse 
---
 include/linux/mm.h   |3 +++
 include/linux/rmap.h |3 +++
 lib/Kconfig.debug|9 +
 mm/interval_tree.c   |   41 -
 mm/mmap.c|   19 +--
 5 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 19d63ec2cbbb..1a2b1a44bd4e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1367,6 +1367,9 @@ struct anon_vma_chain *anon_vma_interval_tree_iter_first(
struct rb_root *root, unsigned long start, unsigned long last);
 struct anon_vma_chain *anon_vma_interval_tree_iter_next(
struct anon_vma_chain *node, unsigned long start, unsigned long last);
+#ifdef CONFIG_DEBUG_VM_RB
+void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
+#endif
 
 #define anon_vma_interval_tree_foreach(avc, root, start, last)  \
for (avc = anon_vma_interval_tree_iter_first(root, start, last); \
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index dce44f7d3ed8..b2cce644ffc7 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -66,6 +66,9 @@ struct anon_vma_chain {
struct list_head same_vma;   /* locked by mmap_sem & page_table_lock */
struct rb_node rb;  /* locked by anon_vma->mutex */
unsigned long rb_subtree_last;
+#ifdef CONFIG_DEBUG_VM_RB
+   unsigned long cached_vma_start, cached_vma_last;
+#endif
 };
 
 #ifdef CONFIG_MMU
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index eba4b0961187..d261b4555dc5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -781,6 +781,15 @@ config DEBUG_VM
 
  If unsure, say N.
 
+config DEBUG_VM_RB
+   bool "Debug VM red-black trees"
+   depends on DEBUG_VM
+   help
+ Enable this to turn on more extended checks in the virtual-memory
+ system that may impact performance.
+
+ If unsure, say N.
+
 config DEBUG_VIRTUAL
bool "Debug VM translations"
depends on DEBUG_KERNEL && X86
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index f7c72cd35e1d..4a5822a586e6 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -70,4 +70,43 @@ static inline unsigned long avc_last_pgoff(struct 
anon_vma_chain *avc)
 }
 
 INTERVAL_TREE_DEFINE(struct anon_vma_chain, rb, unsigned long, rb_subtree_last,
-avc_start_pgoff, avc_last_pgoff,, anon_vma_interval_tree)
+avc_start_pgoff, avc_last_pgoff,
+static inline, __anon_vma_interval_tree)
+
+void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
+  struct rb_root *root)
+{
+#ifdef CONFIG_DEBUG_VM_RB
+   node->cached_vma_start = avc_start_pgoff(node);
+   node->cached_vma_last = avc_last_pgoff(node);
+#endif
+   __anon_vma_interval_tree_insert(node, root);
+}
+
+void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
+  struct rb_root *root)
+{
+   __anon_vma_interval_tree_remove(node, root);
+}
+
+struct anon_vma_chain *
+anon_vma_interval_tree_iter_first(struct rb_root *root,
+ unsigned long first, unsigned long last)
+{
+   return __anon_vma_interval_tree_iter_first(root, first, last);
+}
+
+struct anon_vma_chain *
+anon_vma_interval_tree_iter_next(struct anon_vma_chain *node,
+unsigned long first, unsigned long last)
+{
+   return __anon_vma_interval_tree_iter_next(node, first, last);
+}
+
+#ifdef CONFIG_DEBUG_VM_RB
+void anon_vma_interval_tree_verify(struct anon_vma_chain *node)
+{
+   WARN_ON_ONCE(node->cached_vma_start != avc_start_pgoff(node));
+   WARN_ON_ONCE(node->cached_vma_last != avc_last_pgoff(node));
+}
+#endif
diff --git a/mm/mmap.c b/mm/mmap.c
index 1a6afdb5194a..884bda4cd3ea 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -51,12 +51,6 @@ static void unmap_region(struct mm_struct *mm,
struct vm_area_struct *vma, struct vm_area_struct *prev,
unsigned long start, unsigned long end);
 
-/*
- * WARNING: the debugging will use recursive algorithms so never enable this
- * unless you know what you are doing.
- */
-#undef DEBUG_MM_RB
-
 /* description of effects of mapping type and prot in current implementation.
  * this is due to the limited x86 page protection hardware.  The expected
  * behavior is in parens:
@@ -306,7 +300,7 @@ out:
return retval;
 }
 
-#ifdef DEBUG_MM_RB
+#ifdef 

[PATCH 6/7] mm: add CONFIG_DEBUG_VM_RB build option

2012-09-04 Thread Michel Lespinasse
Add a CONFIG_DEBUG_VM_RB build option for the previously existing
DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using
recursive algorithms, we can expose it a bit more.

Also extend this code to validate_mm() after stack expansion, and to
check that the vma's start and last pgoffs have not changed since the
nodes were inserted on the anon vma interval tree (as it is important
that the nodes be reindexed after each such update).

Signed-off-by: Michel Lespinasse wal...@google.com
---
 include/linux/mm.h   |3 +++
 include/linux/rmap.h |3 +++
 lib/Kconfig.debug|9 +
 mm/interval_tree.c   |   41 -
 mm/mmap.c|   19 +--
 5 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 19d63ec2cbbb..1a2b1a44bd4e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1367,6 +1367,9 @@ struct anon_vma_chain *anon_vma_interval_tree_iter_first(
struct rb_root *root, unsigned long start, unsigned long last);
 struct anon_vma_chain *anon_vma_interval_tree_iter_next(
struct anon_vma_chain *node, unsigned long start, unsigned long last);
+#ifdef CONFIG_DEBUG_VM_RB
+void anon_vma_interval_tree_verify(struct anon_vma_chain *node);
+#endif
 
 #define anon_vma_interval_tree_foreach(avc, root, start, last)  \
for (avc = anon_vma_interval_tree_iter_first(root, start, last); \
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index dce44f7d3ed8..b2cce644ffc7 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -66,6 +66,9 @@ struct anon_vma_chain {
struct list_head same_vma;   /* locked by mmap_sem  page_table_lock */
struct rb_node rb;  /* locked by anon_vma-mutex */
unsigned long rb_subtree_last;
+#ifdef CONFIG_DEBUG_VM_RB
+   unsigned long cached_vma_start, cached_vma_last;
+#endif
 };
 
 #ifdef CONFIG_MMU
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index eba4b0961187..d261b4555dc5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -781,6 +781,15 @@ config DEBUG_VM
 
  If unsure, say N.
 
+config DEBUG_VM_RB
+   bool Debug VM red-black trees
+   depends on DEBUG_VM
+   help
+ Enable this to turn on more extended checks in the virtual-memory
+ system that may impact performance.
+
+ If unsure, say N.
+
 config DEBUG_VIRTUAL
bool Debug VM translations
depends on DEBUG_KERNEL  X86
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index f7c72cd35e1d..4a5822a586e6 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -70,4 +70,43 @@ static inline unsigned long avc_last_pgoff(struct 
anon_vma_chain *avc)
 }
 
 INTERVAL_TREE_DEFINE(struct anon_vma_chain, rb, unsigned long, rb_subtree_last,
-avc_start_pgoff, avc_last_pgoff,, anon_vma_interval_tree)
+avc_start_pgoff, avc_last_pgoff,
+static inline, __anon_vma_interval_tree)
+
+void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
+  struct rb_root *root)
+{
+#ifdef CONFIG_DEBUG_VM_RB
+   node-cached_vma_start = avc_start_pgoff(node);
+   node-cached_vma_last = avc_last_pgoff(node);
+#endif
+   __anon_vma_interval_tree_insert(node, root);
+}
+
+void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
+  struct rb_root *root)
+{
+   __anon_vma_interval_tree_remove(node, root);
+}
+
+struct anon_vma_chain *
+anon_vma_interval_tree_iter_first(struct rb_root *root,
+ unsigned long first, unsigned long last)
+{
+   return __anon_vma_interval_tree_iter_first(root, first, last);
+}
+
+struct anon_vma_chain *
+anon_vma_interval_tree_iter_next(struct anon_vma_chain *node,
+unsigned long first, unsigned long last)
+{
+   return __anon_vma_interval_tree_iter_next(node, first, last);
+}
+
+#ifdef CONFIG_DEBUG_VM_RB
+void anon_vma_interval_tree_verify(struct anon_vma_chain *node)
+{
+   WARN_ON_ONCE(node-cached_vma_start != avc_start_pgoff(node));
+   WARN_ON_ONCE(node-cached_vma_last != avc_last_pgoff(node));
+}
+#endif
diff --git a/mm/mmap.c b/mm/mmap.c
index 1a6afdb5194a..884bda4cd3ea 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -51,12 +51,6 @@ static void unmap_region(struct mm_struct *mm,
struct vm_area_struct *vma, struct vm_area_struct *prev,
unsigned long start, unsigned long end);
 
-/*
- * WARNING: the debugging will use recursive algorithms so never enable this
- * unless you know what you are doing.
- */
-#undef DEBUG_MM_RB
-
 /* description of effects of mapping type and prot in current implementation.
  * this is due to the limited x86 page protection hardware.  The expected
  * behavior is in parens:
@@ -306,7 +300,7 @@ out:
return retval;
 }
 
-#ifdef DEBUG_MM_RB
+#ifdef