Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-23 Thread Mel Gorman
On Fri, Mar 20, 2015 at 10:02:23AM -0700, Linus Torvalds wrote: On Thu, Mar 19, 2015 at 9:13 PM, Dave Chinner da...@fromorbit.com wrote: Testing now. It's a bit faster - three runs gave 7m35s, 7m20s and 7m36s. IOWs's a bit better, but not significantly. page migrations are pretty much

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-20 Thread Mel Gorman
On Thu, Mar 19, 2015 at 06:29:47PM -0700, Linus Torvalds wrote: And the VM_WRITE test should be stable and not have any subtle interaction with the other changes that the numa pte things introduced. It would be good to see if the profiles then pop something *else* up as the performance

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-20 Thread Mel Gorman
On Thu, Mar 19, 2015 at 04:05:46PM -0700, Linus Torvalds wrote: On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner da...@fromorbit.com wrote: My recollection wasn't faulty - I pulled it from an earlier email. That said, the original measurement might have been faulty. I ran the numbers again

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-20 Thread Linus Torvalds
On Thu, Mar 19, 2015 at 9:13 PM, Dave Chinner da...@fromorbit.com wrote: Testing now. It's a bit faster - three runs gave 7m35s, 7m20s and 7m36s. IOWs's a bit better, but not significantly. page migrations are pretty much unchanged, too: 558,632 migrate:mm_migrate_pages ( +-

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-19 Thread Dave Chinner
On Thu, Mar 19, 2015 at 02:41:48PM -0700, Linus Torvalds wrote: On Wed, Mar 18, 2015 at 10:31 AM, Linus Torvalds torva...@linux-foundation.org wrote: So I think there's something I'm missing. For non-shared mappings, I still have the idea that pte_dirty should be the same as pte_write.

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-19 Thread Dave Chinner
On Thu, Mar 19, 2015 at 04:05:46PM -0700, Linus Torvalds wrote: On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner da...@fromorbit.com wrote: My recollection wasn't faulty - I pulled it from an earlier email. That said, the original measurement might have been faulty. I ran the numbers again

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-19 Thread Dave Chinner
On Thu, Mar 19, 2015 at 04:05:46PM -0700, Linus Torvalds wrote: Can you try Mel's change to make it use if (!(vma-vm_flags VM_WRITE)) instead of the pte details? Again, on otherwise plain 3.19, just so that we have a baseline. I'd be *so* much happer with checking the vma details

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-19 Thread Linus Torvalds
On Thu, Mar 19, 2015 at 5:23 PM, Dave Chinner da...@fromorbit.com wrote: Bit more variance there than the pte checking, but runtime difference is in the noise - 5m4s vs 4m54s - and profiles are identical to the pte checking version. Ahh, so that !(vma-vm_flags VM_WRITE) test works _almost_

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-19 Thread Dave Chinner
On Thu, Mar 19, 2015 at 06:29:47PM -0700, Linus Torvalds wrote: On Thu, Mar 19, 2015 at 5:23 PM, Dave Chinner da...@fromorbit.com wrote: Bit more variance there than the pte checking, but runtime difference is in the noise - 5m4s vs 4m54s - and profiles are identical to the pte checking

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-19 Thread Linus Torvalds
On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner da...@fromorbit.com wrote: My recollection wasn't faulty - I pulled it from an earlier email. That said, the original measurement might have been faulty. I ran the numbers again on the 3.19 kernel I saved away from the original testing. That came

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-19 Thread Mel Gorman
On Wed, Mar 18, 2015 at 10:31:28AM -0700, Linus Torvalds wrote: - something completely different that I am entirely missing So I think there's something I'm missing. For non-shared mappings, I still have the idea that pte_dirty should be the same as pte_write. And yet, your testing of 3.19

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-19 Thread Linus Torvalds
On Thu, Mar 19, 2015 at 7:10 AM, Mel Gorman mgor...@suse.de wrote: - if (!pmd_dirty(pmd)) + /* See similar comment in do_numa_page for explanation */ + if (!(vma-vm_flags VM_WRITE)) Yeah, that would certainly be a whole lot more obvious than all the if this particular

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-19 Thread Linus Torvalds
On Wed, Mar 18, 2015 at 10:31 AM, Linus Torvalds torva...@linux-foundation.org wrote: So I think there's something I'm missing. For non-shared mappings, I still have the idea that pte_dirty should be the same as pte_write. And yet, your testing of 3.19 shows that it's a big difference.

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-18 Thread Linus Torvalds
On Tue, Mar 17, 2015 at 3:08 PM, Dave Chinner da...@fromorbit.com wrote: Damn. From a performance number standpoint, it looked like we zoomed in on the right thing. But now it's migrating even more pages than before. Odd. Throttling problem, like Mel originally suspected? That doesn't much

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-18 Thread Linus Torvalds
On Wed, Mar 18, 2015 at 9:08 AM, Linus Torvalds torva...@linux-foundation.org wrote: So why am I wrong? Why is testing for dirty not the same as testing for writable? I can see a few cases: - your load has lots of writable (but not written-to) shared memory Hmm. I tried to look at the

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-18 Thread Dave Chinner
On Wed, Mar 18, 2015 at 10:31:28AM -0700, Linus Torvalds wrote: On Wed, Mar 18, 2015 at 9:08 AM, Linus Torvalds torva...@linux-foundation.org wrote: So why am I wrong? Why is testing for dirty not the same as testing for writable? I can see a few cases: - your load has lots of

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-17 Thread Dave Chinner
On Tue, Mar 17, 2015 at 02:30:57PM -0700, Linus Torvalds wrote: On Tue, Mar 17, 2015 at 1:51 PM, Dave Chinner da...@fromorbit.com wrote: On the -o ag_stride=-1 -o bhash=101073 config, the 60s perf stat I was using during steady state shows: 471,752 migrate:mm_migrate_pages (

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-17 Thread Linus Torvalds
On Tue, Mar 17, 2015 at 1:51 PM, Dave Chinner da...@fromorbit.com wrote: On the -o ag_stride=-1 -o bhash=101073 config, the 60s perf stat I was using during steady state shows: 471,752 migrate:mm_migrate_pages ( +- 7.38% ) The migrate pages rate is even higher than in 4.0-rc1

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-17 Thread Linus Torvalds
On Tue, Mar 17, 2015 at 12:06 AM, Dave Chinner da...@fromorbit.com wrote: TO close the loop here, now I'm back home and can run tests: config3.19 4.0-rc1 4.0-rc4 defaults 8m08s9m34s 9m14s -o ag_stride=-1

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-12 Thread Mel Gorman
On Tue, Mar 10, 2015 at 04:55:52PM -0700, Linus Torvalds wrote: On Mon, Mar 9, 2015 at 12:19 PM, Dave Chinner da...@fromorbit.com wrote: On Mon, Mar 09, 2015 at 09:52:18AM -0700, Linus Torvalds wrote: What's your virtual environment setup? Kernel config, and virtualization environment to

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-12 Thread Linus Torvalds
On Thu, Mar 12, 2015 at 6:10 AM, Mel Gorman mgor...@suse.de wrote: I believe you're correct and it matches what was observed. I'm still travelling and wireless is dirt but managed to queue a test using pmd_dirty Ok, thanks. I'm not entirely happy with that change, and I suspect the whole

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-12 Thread Mel Gorman
On Thu, Mar 12, 2015 at 09:20:36AM -0700, Linus Torvalds wrote: On Thu, Mar 12, 2015 at 6:10 AM, Mel Gorman mgor...@suse.de wrote: I believe you're correct and it matches what was observed. I'm still travelling and wireless is dirt but managed to queue a test using pmd_dirty Ok, thanks.

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-10 Thread Dave Chinner
On Mon, Mar 09, 2015 at 09:52:18AM -0700, Linus Torvalds wrote: On Mon, Mar 9, 2015 at 4:29 AM, Dave Chinner da...@fromorbit.com wrote: Also, is there some sane way for me to actually see this behavior on a regular machine with just a single socket? Dave is apparently running in some

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-10 Thread Mel Gorman
On Mon, Mar 09, 2015 at 09:02:19PM +, Mel Gorman wrote: On Sun, Mar 08, 2015 at 08:40:25PM +, Mel Gorman wrote: Because if the answer is 'yes', then we can safely say: 'we regressed performance because correctness [not dropping dirty bits] comes before performance'. If

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-10 Thread Linus Torvalds
On Mon, Mar 9, 2015 at 12:19 PM, Dave Chinner da...@fromorbit.com wrote: On Mon, Mar 09, 2015 at 09:52:18AM -0700, Linus Torvalds wrote: What's your virtual environment setup? Kernel config, and virtualization environment to actually get that odd fake NUMA thing happening? I don't have the

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-09 Thread Dave Chinner
On Sun, Mar 08, 2015 at 11:35:59AM -0700, Linus Torvalds wrote: On Sun, Mar 8, 2015 at 3:02 AM, Ingo Molnar mi...@kernel.org wrote: But: As a second hack (not to be applied), could we change: #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL to: #define _PAGE_BIT_PROTNONE

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-09 Thread Linus Torvalds
On Mon, Mar 9, 2015 at 4:29 AM, Dave Chinner da...@fromorbit.com wrote: Also, is there some sane way for me to actually see this behavior on a regular machine with just a single socket? Dave is apparently running in some fake-numa setup, I'm wondering if this is easy enough to reproduce that

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-09 Thread Mel Gorman
On Sun, Mar 08, 2015 at 08:40:25PM +, Mel Gorman wrote: Because if the answer is 'yes', then we can safely say: 'we regressed performance because correctness [not dropping dirty bits] comes before performance'. If the answer is 'no', then we still have a mystery (and a regression)

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-08 Thread Mel Gorman
On Sun, Mar 08, 2015 at 11:02:23AM +0100, Ingo Molnar wrote: * Linus Torvalds torva...@linux-foundation.org wrote: On Sat, Mar 7, 2015 at 8:36 AM, Ingo Molnar mi...@kernel.org wrote: And the patch Dave bisected to is a relatively simple patch. Why not simply revert it to see

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-08 Thread Linus Torvalds
On Sun, Mar 8, 2015 at 3:02 AM, Ingo Molnar mi...@kernel.org wrote: Well, there's a difference in what we write to the pte: #define _PAGE_BIT_NUMA (_PAGE_BIT_GLOBAL+1) #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL and our expectation was that the two should be equivalent

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-08 Thread Linus Torvalds
On Sun, Mar 8, 2015 at 11:35 AM, Linus Torvalds torva...@linux-foundation.org wrote: As a second hack (not to be applied), could we change: #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL to: #define _PAGE_BIT_PROTNONE (_PAGE_BIT_GLOBAL+1) to double check that the position of the

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-08 Thread Ingo Molnar
* Mel Gorman mgor...@suse.de wrote: Elapsed time is primarily worse on one benchmark -- numa01 which is an adverse workload. The user time differences are also dominated by that benchmark 4.0.0-rc1 4.0.0-rc1 3.19.0

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-08 Thread Ingo Molnar
* Mel Gorman mgor...@suse.de wrote: xfsrepair 4.0.0-rc1 4.0.0-rc1 3.19.0 vanilla slowscan-v2 vanilla Min real-fsmark1157.41 ( 0.00%) 1150.38 (

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-08 Thread Ingo Molnar
* Linus Torvalds torva...@linux-foundation.org wrote: On Sat, Mar 7, 2015 at 8:36 AM, Ingo Molnar mi...@kernel.org wrote: And the patch Dave bisected to is a relatively simple patch. Why not simply revert it to see whether that cures much of the problem? So the problem with that is

[PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-07 Thread Mel Gorman
Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226 Across the board the 4.0-rc1 numbers are much slower, and the degradation is far worse when using the large memory footprint configs. Perf points straight at the cause - this is from 4.0-rc1 on the -o bhash=101073 config:

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-07 Thread Linus Torvalds
On Sat, Mar 7, 2015 at 8:36 AM, Ingo Molnar mi...@kernel.org wrote: And the patch Dave bisected to is a relatively simple patch. Why not simply revert it to see whether that cures much of the problem? So the problem with that is that pmd_set_numa() and friends simply no longer exist. So we

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-07 Thread Mel Gorman
On Sat, Mar 07, 2015 at 05:36:58PM +0100, Ingo Molnar wrote: * Mel Gorman mgor...@suse.de wrote: Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226 Across the board the 4.0-rc1 numbers are much slower, and the degradation is far worse when using the large

Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

2015-03-07 Thread Ingo Molnar
* Mel Gorman mgor...@suse.de wrote: Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226 Across the board the 4.0-rc1 numbers are much slower, and the degradation is far worse when using the large memory footprint configs. Perf points straight at the cause - this is