Re: [PATCH 07/10] mm: base LRU balancing on an explicit cost model

2016-06-06 Thread Rik van Riel
On Mon, 2016-06-06 at 15:48 -0400, Johannes Weiner wrote: > Currently, scan pressure between the anon and file LRU lists is > balanced based on a mixture of reclaim efficiency and a somewhat > vague > notion of "value" of having certain pages in memory over others. That > concept of value is proble

Re: [PATCH 06/10] mm: remove unnecessary use-once cache bias from LRU balancing

2016-06-06 Thread Rik van Riel
On Mon, 2016-06-06 at 15:48 -0400, Johannes Weiner wrote: > When the splitlru patches divided page cache and swap-backed pages > into separate LRU lists, the pressure balance between the lists was > biased to account for the fact that streaming IO can cause memory > pressure with a flood of pages t

Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug

2016-06-06 Thread Rik van Riel
On Mon, 2016-06-06 at 15:40 +0200, Paolo Bonzini wrote: > > On 02/06/2016 15:59, Rik van Riel wrote: > > > > If a guest is saved to disk and later restored (eg. after > > a host reboot), or live migrated to another host, I would > > expect to get totally disjo

Re: [PATCH 05/10] mm: remove LRU balancing effect of temporary page isolation

2016-06-06 Thread Rik van Riel
On Mon, 2016-06-06 at 18:15 -0400, Johannes Weiner wrote: > On Mon, Jun 06, 2016 at 05:56:09PM -0400, Rik van Riel wrote: > > > > On Mon, 2016-06-06 at 15:48 -0400, Johannes Weiner wrote: > > > > > >   > > > +void lru_cache_putback(struct page *p

Re: [PATCH 05/10] mm: remove LRU balancing effect of temporary page isolation

2016-06-06 Thread Rik van Riel
On Mon, 2016-06-06 at 15:48 -0400, Johannes Weiner wrote: >  > +void lru_cache_putback(struct page *page) > +{ > + struct pagevec *pvec = &get_cpu_var(lru_putback_pvec); > + > + get_page(page); > + if (!pagevec_space(pvec)) > + __pagevec_lru_add(pvec, false); > + pagevec

Re: [PATCH 04/10] mm: fix LRU balancing effect of new transparent huge pages

2016-06-06 Thread Rik van Riel
ays account THP by the number of basepages, and remove the fixup > from the splitting path. > > Signed-off-by: Johannes Weiner > Reviewed-by: Rik van Riel -- All Rights Reversed. signature.asc Description: This is a digitally signed message part

Re: [PATCH 03/10] mm: fold and remove lru_cache_add_anon() and lru_cache_add_file()

2016-06-06 Thread Rik van Riel
On Mon, 2016-06-06 at 15:48 -0400, Johannes Weiner wrote: > They're the same function, and for the purpose of all callers they > are > equivalent to lru_cache_add(). > > Signed-off-by: Johannes Weiner > Reviewed-by: Rik van Riel -- All Rights Reversed. signature.asc

Re: [PATCH 02/10] mm: swap: unexport __pagevec_lru_add()

2016-06-06 Thread Rik van Riel
; Signed-off-by: Johannes Weiner > Reviewed-by: Rik van Riel -- All Rights Reversed. signature.asc Description: This is a digitally signed message part

Re: [PATCH v2] sched/cputime: add steal clock warp handling

2016-06-06 Thread Rik van Riel
On Mon, 2016-06-06 at 15:44 +0200, Paolo Bonzini wrote: > > On 03/06/2016 15:10, Rik van Riel wrote: > > > > On Fri, 2016-06-03 at 13:21 +0800, Wanpeng Li wrote: > > > > > > From: Wanpeng Li > > > > > > I observed that sometimes st is 100%

Re: [PATCH v2] sched/cputime: add steal clock warp handling

2016-06-03 Thread Rik van Riel
left), ignoring > intervals  > that are negative or longer than a second, and using those to sync > up  > the guest with the host. > > Cc: Ingo Molnar > Cc: Peter Zijlstra (Intel) > Cc: Rik van Riel > Cc: Thomas Gleixner > Cc: Frederic Weisbecker > Cc: Paolo B

Re: [PATCH] sched/cputime: add steal clock warps handling during cpu hotplug

2016-06-02 Thread Rik van Riel
On Thu, 2016-06-02 at 14:00 +0200, Peter Zijlstra wrote: > On Thu, Jun 02, 2016 at 07:57:19PM +0800, Wanpeng Li wrote: > > > > From: Wanpeng Li > > > > I observed that sometimes st is 100% instantaneous, then idle is > > 100%  > > even if there is a cpu hog on the guest cpu after the cpu hotplug

Re: [PATCH] sched/cputime: Fix steal time accouting during cpu hotplug

2016-06-02 Thread Rik van Riel
xes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time > accounting vs. CPU hotplug")' > Cc: Ingo Molnar > Cc: Peter Zijlstra (Intel) > Cc: Rik van Riel > Cc: Thomas Gleixner > Cc: Frederic Weisbecker > Cc: Paolo Bonzini > Cc: Radim >

Re: [PATCH 2/2] x86/entry: Inline enter_from_user_mode

2016-06-01 Thread Rik van Riel
On Mon, 2016-05-30 at 14:30 +0200, Paolo Bonzini wrote: > This matches what is already done for prepare_exit_to_usermode, > and saves about 60 clock cycles (4% speedup) with the benchmark > in the previous commit message. > > Cc: Andy Lutomirski > Cc: Peter Zijlstra > Cc:

Re: [PATCH 1/2] x86/entry: Avoid interrupt flag save and restore

2016-06-01 Thread Rik van Riel
er_exit > that skips saving and restoring the interrupt flag. > > On an AMD-based machine I tested this patch on, with force-enabled > context tracking, the speed-up in system calls was 90 clock cycles or > 6%, > measured with the following simple benchmark: > Reviewed-b

Re: [PATCH v3] sched/cputime: add steal time support to full dynticks CPU time accounting

2016-05-24 Thread Rik van Riel
ime is jiffy  > based sampling even if it's still listened to ring boundaries, so  > steal_account_process_tick() is reused to account how much 'ticks'  > are steal time after the last accumulation.  > > Suggested-by: Rik van Riel > Cc: Ingo Molnar > Cc:

Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem

2016-05-23 Thread Rik van Riel
On Mon, 2016-05-23 at 23:02 +0300, Kirill A. Shutemov wrote: > On Mon, May 23, 2016 at 03:26:47PM -0400, Rik van Riel wrote: > > > > On Mon, 2016-05-23 at 22:01 +0300, Kirill A. Shutemov wrote: > > > > > > On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wr

Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem

2016-05-23 Thread Rik van Riel
On Mon, 2016-05-23 at 22:01 +0300, Kirill A. Shutemov wrote: > On Mon, May 23, 2016 at 02:49:09PM -0400, Rik van Riel wrote: > > > > On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote: > > > > > > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote: > > >

Re: [PATCH 3/3] mm, thp: make swapin readahead under down_read of mmap_sem

2016-05-23 Thread Rik van Riel
On Mon, 2016-05-23 at 20:42 +0200, Michal Hocko wrote: > On Mon 23-05-16 20:14:11, Ebru Akagunduz wrote: > > > > Currently khugepaged makes swapin readahead under > > down_write. This patch supplies to make swapin > > readahead under down_read instead of down_write. > You are still keeping down_wr

Re: [RFC][PATCH 8/7] sched/fair: Use utilization distance to filter affine sync wakeups

2016-05-19 Thread Rik van Riel
On Wed, 2016-05-18 at 07:51 +0200, Mike Galbraith wrote: > On Mon, 2016-05-09 at 12:48 +0200, Peter Zijlstra wrote: > > Hai, > > (got some of the frozen variety handy?:) > > > here be a semi coherent patch series for the recent > > select_idle_siblings() > > tinkering. Happy benchmarking.. > > A

Re: [PATCH] sched/cputime: add steal time support to full dynticks CPU time accounting

2016-05-18 Thread Rik van Riel
On Tue, 2016-05-10 at 13:34 +0800, Wanpeng Li wrote: >  > +++ b/kernel/sched/cputime.c >  > @@ -691,8 +691,11 @@ static cputime_t get_vtime_delta(struct > task_struct *tsk) >   >  static void __vtime_account_system(struct task_struct *tsk) >  { > + unsigned long steal_time = steal_account_proce

Re: [PATCH] sched/cputime: add steal time support to full dynticks CPU time accounting

2016-05-17 Thread Rik van Riel
pling even if it's  > still listened to ring boundaries, so steal_account_process_tick()  > is reused to account how much 'ticks' are steal time after the  > last accumulation.  > > Suggested-by: Rik van Riel > Cc: Ingo Molnar > Cc: Peter Zijlstra (Intel) >

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-11 Thread Rik van Riel
On Wed, 2016-05-11 at 07:40 -0700, Eric Dumazet wrote: > On Wed, May 11, 2016 at 6:13 AM, Hannes Frederic Sowa > wrote: > > > This looks racy to me as the ksoftirqd could be in the progress to > > stop > > and we would miss another softirq invocation. > > Looking at smpboot_thread_fn(), it looks

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Rik van Riel
On Tue, 2016-05-10 at 14:53 -0700, Eric Dumazet wrote: > On Tue, 2016-05-10 at 17:35 -0400, Rik van Riel wrote: > > > > > You might need another one of these in invoke_softirq() > > > Excellent. > > I gave it a quick try (without your suggestion), and host

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Rik van Riel
On Tue, 2016-05-10 at 14:31 -0700, Eric Dumazet wrote: > On Tue, 2016-05-10 at 14:09 -0700, Eric Dumazet wrote: > > > > On Tue, May 10, 2016 at 1:46 PM, Hannes Frederic Sowa > > wrote: > > > > > > > > I agree here, but I don't think this patch particularly is a lot > > > of > > > bloat and some

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Rik van Riel
On Tue, 2016-05-10 at 16:52 -0400, David Miller wrote: > From: Rik van Riel > Date: Tue, 10 May 2016 16:50:56 -0400 > > > On Tue, 2016-05-10 at 16:45 -0400, David Miller wrote: > >> From: Paolo Abeni > >> Date: Tue, 10 May 2016 22:22:50 +0200 > >>  >

Re: [RFC PATCH 0/2] net: threadable napi poll loop

2016-05-10 Thread Rik van Riel
On Tue, 2016-05-10 at 16:45 -0400, David Miller wrote: > From: Paolo Abeni > Date: Tue, 10 May 2016 22:22:50 +0200 > > > On Tue, 2016-05-10 at 09:08 -0700, Eric Dumazet wrote: > >> On Tue, 2016-05-10 at 18:03 +0200, Paolo Abeni wrote: > >>  > >> > If a single core host is under network flood, i.e

Re: [patch 2/7] lib/hashmod: Add modulo based hash mechanism

2016-04-29 Thread Rik van Riel
On Fri, 2016-04-29 at 16:51 -0700, Linus Torvalds wrote: > There's presumably a few optimal values from a "spread bits out > evenly" standpoint, and they won't have anything to do with random > irrational constants, and will have everything to do with having nice > bitpatterns. > > I'm adding Rik

Re: [RFC] The Linux Scheduler: a Decade of Wasted Cores Report

2016-04-25 Thread Rik van Riel
On Mon, 2016-04-25 at 11:34 +0200, Peter Zijlstra wrote: > On Sat, Apr 23, 2016 at 06:38:25PM -0700, Brendan Gregg wrote: > >  > > Their proof of concept patches are online[1]. I tested them and saw > > 0% > > improvements on the systems I tested, for some simple workloads[2]. > > I > > tested 1 an

Re: [PATCH 4.6] mm: wake kcompactd before kswapd's short sleep

2016-04-20 Thread Rik van Riel
s fully > sleep > until an allocation slowpath wakes it up again. > Reviewed-by: Rik van Riel

Re: [PATCH kernel 1/2] mm: add the related functions to build the free page bitmap

2016-04-19 Thread Rik van Riel
On Tue, 2016-04-19 at 15:02 +, Li, Liang Z wrote: > > > > On Tue, 2016-04-19 at 22:34 +0800, Liang Li wrote: > > > > > > The free page bitmap will be sent to QEMU through virtio > > > interface and > > > used for live migration optimization. > > > Drop the cache before building the free page

Re: [PATCH kernel 1/2] mm: add the related functions to build the free page bitmap

2016-04-19 Thread Rik van Riel
On Tue, 2016-04-19 at 22:34 +0800, Liang Li wrote: > The free page bitmap will be sent to QEMU through virtio interface > and used for live migration optimization. > Drop the cache before building the free page bitmap can get more > free pages. Whether dropping the cache is decided by user. > How

Re: [RFC PATCH] sched/cputime: drop local_irq_safe() in vtime_init_idle()

2016-04-14 Thread Rik van Riel
rved > while not strictly required (given the first part of this commit). > A little later, sched_clock_cpu() was replaced with jiffies via > ff9a9b4c4334 ("sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy > granularity"). > Based on this events I assume it is safe t

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

2016-04-11 Thread Rik van Riel
On Mon, 2016-04-11 at 15:21 +0200, Rafael J. Wysocki wrote: > On Mon, Apr 11, 2016 at 2:38 PM, Rik van Riel > wrote: > > > > On Mon, 2016-04-11 at 05:04 +0200, Mike Galbraith wrote: > > > > > > On Sun, 2016-04-10 at 16:24 -0400, Rik van Riel wrote: > >

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

2016-04-11 Thread Rik van Riel
On Mon, 2016-04-11 at 05:04 +0200, Mike Galbraith wrote: > On Sun, 2016-04-10 at 16:24 -0400, Rik van Riel wrote: > > > > On Sun, 2016-04-10 at 17:39 +0200, Mike Galbraith wrote: > > > > > > > > Should the default idle state not then be governor

Re: [regression] cross core scheduling frequency drop bisected to 0c313cb20732

2016-04-10 Thread Rik van Riel
On Sun, 2016-04-10 at 17:39 +0200, Mike Galbraith wrote: > On Sat, 2016-04-09 at 14:31 +0200, Rafael J. Wysocki wrote: > > > > On Sat, Apr 9, 2016 at 1:07 PM, Peter Zijlstra > g> > > wrote: > > > > > > On Fri, Apr 08, 2016 at 10:59:59PM +0200, Rafael J. Wysocki > > > wrote: > > > > > > > > On F

Re: [PATCH v5 2/2] mm, thp: avoid unnecessary swapin in khugepaged

2016-04-07 Thread Rik van Riel
On Thu, 2016-04-07 at 21:58 +0300, Cyrill Gorcunov wrote: > On Thu, Apr 07, 2016 at 08:28:01PM +0300, Ebru Akagunduz wrote: > ... > > > > + swap = get_mm_counter(mm, MM_SWAPENTS); > > + curr_allocstall = sum_vm_event(ALLOCSTALL); > > + /* > > +  * When system under pressure, don't swapin r

Re: [PATCH v5 2/2] mm, thp: avoid unnecessary swapin in khugepaged

2016-04-07 Thread Rik van Riel
188 kB |%25| > --- > Without patch | 389728 kB | 194560 kB | 410272 kB |%49| > ----------- > > Signed-off-by: Ebru Akagunduz Acked-by: Rik van Riel -- All Rights Reversed. signature.asc Description: This is a digitally signed message part

Re: [PATCH 2/3] mm: filemap: only do access activations on reads

2016-04-04 Thread Rik van Riel
On Mon, 2016-04-04 at 14:22 -0700, Andrew Morton wrote: > On Mon,  4 Apr 2016 13:13:37 -0400 Johannes Weiner g> wrote: > > > > > Andres Freund observed that his database workload is struggling > > with > > the transaction journal creating pressure on frequently read pages. > > > > Access patter

Re: [PATCH] kvm: x86: make lapic hrtimer pinned

2016-04-04 Thread Rik van Riel
host, anyway, I don't see a downside to your patch. If that is ever changed (eg. allowing delivery of a timer interrupt to a VCPU without trapping to the host), we may want to revisit this. Until then... Acked-by: Rik van Riel -- All Rights Reversed. signature.asc Description: This is a digitally signed message part

Re: [RFC] sched: unused cpu in affine workload

2016-04-04 Thread Rik van Riel
On Mon, 2016-04-04 at 15:23 +0200, Peter Zijlstra wrote: > On Mon, Apr 04, 2016 at 11:38:44AM +0200, Ingo Molnar wrote: > > > > We'd upgrade that to O(nr_cpus^2), which is totally unrealistic > > with 16,000 CPUs  > > even in a slowpath - but it would probably cause problems even with > > 120 CPUs

Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative

2016-04-04 Thread Rik van Riel
On Mon, 2016-04-04 at 15:31 -0400, Chris Metcalf wrote: > On 4/4/2016 3:12 PM, Rik van Riel wrote: > > > > On Fri, 2016-04-01 at 15:42 -0400, Chris Metcalf wrote: > > > > > > On arm64, when calling enqueue_task_fair() from > > > migration_cpu_stop(), >

Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative

2016-04-04 Thread Rik van Riel
On Fri, 2016-04-01 at 15:42 -0400, Chris Metcalf wrote: > On arm64, when calling enqueue_task_fair() from migration_cpu_stop(), > we find the nr_running value updated by add_nr_running(), but the > cfs.nr_running value has not always yet been updated.  Accordingly, > the sched_can_stop_tick() false

Re: [PATCH v4 2/2] mm, thp: avoid unnecessary swapin in khugepaged

2016-03-22 Thread Rik van Riel
On Mon, 2016-03-21 at 16:36 +0100, Michal Hocko wrote: > On Sun 20-03-16 20:07:39, Ebru Akagunduz wrote: > > > > Currently khugepaged makes swapin readahead to improve > > THP collapse rate. This patch checks vm statistics > > to avoid workload of swapin, if unnecessary. So that > > when system un

Re: [PATCH] mm: Export symbols unmapped_area() & unmapped_area_topdown()

2016-03-19 Thread Rik van Riel
On Wed, 2016-03-16 at 13:36 -0700, Christoph Hellwig wrote: > On Wed, Mar 16, 2016 at 05:10:34PM +, Olu Ogunbowale wrote: > > > > From: Olujide Ogunbowale > > > > Export the memory management functions, unmapped_area() & > > unmapped_area_topdown(), as GPL symbols; this allows the kernel to

Re: [PATCH v3 1/2] mm, vmstat: calculate particular vm event

2016-03-14 Thread Rik van Riel
all the events. > > Signed-off-by: Ebru Akagunduz > Acked-by: Kirill A. Shutemov Reviewed-by: Rik van Riel -- All Rights Reversed. signature.asc Description: This is a digitally signed message part

Re: [PATCH v2 2/2] mm, thp: avoid unnecessary swapin in khugepaged

2016-03-13 Thread Rik van Riel
On Mon, 2016-03-14 at 02:33 +0300, Kirill A. Shutemov wrote: > On Sun, Mar 13, 2016 at 11:28:55AM +0200, Ebru Akagunduz wrote: > >  > > @@ -2493,7 +2494,14 @@ static void collapse_huge_page(struct > > mm_struct *mm, > >   goto out; > >   } > >   > > - __collapse_huge_page_swapin(mm, v

Re: [Qemu-devel] [RFC qemu 0/4] A PV solution for live migration optimization

2016-03-09 Thread Rik van Riel
On Wed, 2016-03-09 at 20:04 +0300, Roman Kagan wrote: > On Wed, Mar 09, 2016 at 05:41:39PM +0200, Michael S. Tsirkin wrote: > > On Wed, Mar 09, 2016 at 05:28:54PM +0300, Roman Kagan wrote: > > > For (1) I've been trying to make a point that skipping clean > > > pages is > > > much more likely to re

Re: [PATCH] sched/cputime: Fix steal time accounting vs. cpu hotplug

2016-03-04 Thread Rik van Riel
mmit 095c0aa83e52 "sched: adjust scheduler cpu power for > stolen time" > Fixes: commit aa483808516c "sched: Remove irq time from available CPU > power" > Signed-off-by: Thomas Gleixner > Cc: sta...@vger.kernel.org Acked-by: Rik van Riel -- All Rights Reversed. signature.asc Description: This is a digitally signed message part

[tip:sched/core] time, acct: Drop irq save & restore from __acct_update_integrals()

2016-02-29 Thread tip-bot for Rik van Riel
Commit-ID: 9344c92c2e72e495f695caef8364b3dd73af0eab Gitweb: http://git.kernel.org/tip/9344c92c2e72e495f695caef8364b3dd73af0eab Author: Rik van Riel AuthorDate: Wed, 10 Feb 2016 20:08:26 -0500 Committer: Ingo Molnar CommitDate: Mon, 29 Feb 2016 09:53:09 +0100 time, acct: Drop irq save

[tip:sched/core] sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity

2016-02-29 Thread tip-bot for Rik van Riel
Commit-ID: ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd Gitweb: http://git.kernel.org/tip/ff9a9b4c4334b53b52ee9279f30bd5dd92ea9bdd Author: Rik van Riel AuthorDate: Wed, 10 Feb 2016 20:08:27 -0500 Committer: Ingo Molnar CommitDate: Mon, 29 Feb 2016 09:53:10 +0100 sched, time: Switch

[tip:sched/core] acct, time: Change indentation in __acct_update_integrals()

2016-02-29 Thread tip-bot for Rik van Riel
Commit-ID: b2add86edd3bc050af350515e6ba26f4622c38f3 Gitweb: http://git.kernel.org/tip/b2add86edd3bc050af350515e6ba26f4622c38f3 Author: Rik van Riel AuthorDate: Wed, 10 Feb 2016 20:08:25 -0500 Committer: Ingo Molnar CommitDate: Mon, 29 Feb 2016 09:53:09 +0100 acct, time: Change

[tip:sched/core] sched, time: Remove non-power-of-two divides from __acct_update_integrals()

2016-02-29 Thread tip-bot for Rik van Riel
Commit-ID: 382c2fe994321d503647ce8ee329b9420dc7c1f9 Gitweb: http://git.kernel.org/tip/382c2fe994321d503647ce8ee329b9420dc7c1f9 Author: Rik van Riel AuthorDate: Wed, 10 Feb 2016 20:08:24 -0500 Committer: Ingo Molnar CommitDate: Mon, 29 Feb 2016 09:53:08 +0100 sched, time: Remove non

Re: [RFC v5 0/3] mm: make swapin readahead to gain more thp performance

2016-02-26 Thread Rik van Riel
On Thu, 2016-02-25 at 22:17 -0800, Hugh Dickins wrote: > On Fri, 26 Feb 2016, Ebru Akagunduz wrote: > > in Thu, Feb 25, 2016 at 05:35:50PM -0500, Rik van Riel wrote: >  > > > Am I forgetting anything obvious? > > >  > > > Is this too aggress

Re: [RFC v5 0/3] mm: make swapin readahead to gain more thp performance

2016-02-25 Thread Rik van Riel
On Wed, 2016-02-24 at 23:36 -0800, Hugh Dickins wrote: >  > Doesn't this imply that __collapse_huge_page_swapin() will initiate > all > the necessary swapins for a THP, then (given the > FAULT_FLAG_ALLOW_RETRY) > not wait for them to complete, so khugepaged will give up on that > extent > and move

Re: [PATCH 1/1] mm: thp: Redefine default THP defrag behaviour disable it by default

2016-02-25 Thread Rik van Riel
pages. I wonder if we should consider mlock one of the slow paths where we should try to actually take the time to create THPs. Also, we might consider doing THP collapse from the NUMA page migration opportunistically, if there is a free 2MB page available on the destination host. Having said al

Re: [PATCH] mm: limit direct reclaim for higher order allocations

2016-02-24 Thread Rik van Riel
On Thu, 2016-02-25 at 09:30 +0900, Joonsoo Kim wrote: > On Wed, Feb 24, 2016 at 05:17:56PM -0500, Rik van Riel wrote: > > On Wed, 2016-02-24 at 14:15 -0800, David Rientjes wrote: > > > On Wed, 24 Feb 2016, Rik van Riel wrote: > > > > > > >

Re: [PATCH] mm: limit direct reclaim for higher order allocations

2016-02-24 Thread Rik van Riel
On Wed, 2016-02-24 at 15:02 -0800, Andrew Morton wrote: > On Wed, 24 Feb 2016 16:38:50 -0500 Rik van Riel > wrote: > > > For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER, > > the kernel will do direct reclaim if compaction failed for any > > reason.

Re: [PATCH] mm: limit direct reclaim for higher order allocations

2016-02-24 Thread Rik van Riel
On Wed, 2016-02-24 at 14:15 -0800, David Rientjes wrote: > On Wed, 24 Feb 2016, Rik van Riel wrote: > > > For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER, > > the kernel will do direct reclaim if compaction failed for any > > reason. This worked fine when

[PATCH] mm: limit direct reclaim for higher order allocations

2016-02-24 Thread Rik van Riel
systems, this may be enough to obtain contiguous free memory areas to satisfy small allocations, continuing our strategy of relying on luck occasionally. On larger systems, relying on luck like that has not been working for years. Signed-off-by: Rik van Riel --- mm/vmscan.c | 19

Re: [PATCH] mm,vmscan: compact memory from kswapd when lots of memory free already

2016-02-23 Thread Rik van Riel
On Tue, 2016-02-23 at 10:18 +0100, Vlastimil Babka wrote: > On 02/23/2016 04:50 AM, Rik van Riel wrote: > > If kswapd is woken up for a higher order allocation, for example > > from alloc_skb, but the system already has lots of memory free, > > kswapd_shrink_zone will rig

[PATCH] mm,vmscan: compact memory from kswapd when lots of memory free already

2016-02-22 Thread Rik van Riel
doing anything to relieve the situation that caused it to be woken up. Going ahead with compaction when kswapd did not attempt to reclaim any memory, and as a consequence did not reclaim any memory, is the right thing to do in this situation. Signed-off-by: Rik van Riel --- mm/vmscan.c | 2 +- 1

Re: [PATCH v2] mm: scale kswapd watermarks in proportion to memory

2016-02-22 Thread Rik van Riel
i.e. 25% of the emergency reserve. > > On a 140G machine, this raises the default watermark steps - the > distance between min and low, and low and high - from 16M to 143M. > > Signed-off-by: Johannes Weiner > Acked-by: Mel Gorman Acked-by: Rik van Riel -- All Rights Rever

Re: [RFC PATCH] proc: do not include shmem and driver pages in /proc/meminfo::Cached

2016-02-18 Thread Rik van Riel
ebody who already subtracts > Shmem > from Cached. > > What are your thoughts on this? >  Reviewed-by: Rik van Riel --  All rights reversed signature.asc Description: This is a digitally signed message part

Re: [PATCH] mm: scale kswapd watermarks in proportion to memory

2016-02-18 Thread Rik van Riel
On Thu, 2016-02-18 at 11:41 -0500, Johannes Weiner wrote: > In machines with 140G of memory and enterprise flash storage, we have > seen read and write bursts routinely exceed the kswapd watermarks and > cause thundering herds in direct reclaim. Unfortunately, the only way > to tune kswapd aggressi

Re: Unhelpful caching decisions, possibly related to active/inactive sizing

2016-02-17 Thread Rik van Riel
hat is in the page. This patch ignores partial writes, because it is unclear whether the complexity of identifying those is worth any potential performance gain obtained from better caching pages that see repeated partial writes at large enough intervals to not get caught by the use-twice promoti

[PATCH RHEL6.8] x86/mm: Improve AMD Bulldozer ASLR workaround

2016-02-16 Thread Rik van Riel
Fixes bug 1240883 Brew build: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=10506428 RHEL6: code changed around from upstream so the address transformations happen in the RHEL6 code flow. Tested on amd-pike-08.klab.eng.bos.redhat.com commit 4e26d11f52684dc8b1632a8cfe450cb5197a8464

Re: [PATCH] kernel: fs: drop_caches: add dds drop_caches_count

2016-02-16 Thread Rik van Riel
On Tue, 2016-02-16 at 16:28 +1100, Dave Chinner wrote: > On Mon, Feb 15, 2016 at 03:52:31PM -0800, Daniel Walker wrote: > > On 02/15/2016 03:05 PM, Dave Chinner wrote: > > >  > > > As for a replacement, looking at what pages you consider > > > "droppable" > > > is really only file pages that are no

Re: computing drop-able caches

2016-02-11 Thread Rik van Riel
On Wed, 2016-02-10 at 11:11 -0800, Daniel Walker wrote: > On 02/10/2016 10:13 AM, Dave Hansen wrote: > > On 02/10/2016 10:04 AM, Daniel Walker wrote: > > > > [Linux_0:/]$ echo 3 > /proc/sys/vm/drop_caches > > > > [Linux_0:/]$ cat /proc/meminfo > > > > MemTotal:3977836 kB > > > > MemFree:   

Re: Unhelpful caching decisions, possibly related to active/inactive sizing

2016-02-11 Thread Rik van Riel
only half the page cache can currently be used to cache the database working set. This patch automatically increases that fraction on larger systems, using the same ratio that has already been used for anonymous memory. Signed-off-by: Rik van Riel Reported-by: Andres Freund --- mm/vmscan.c | 3 ++-

Re: [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy

2016-02-09 Thread Rik van Riel
On Tue, 2016-02-09 at 18:11 +0100, Frederic Weisbecker wrote: >  > So for any T_slice being a given cpu timeslice (in secs) executed > between > two ring switch (user <-> kernel), we are going to account: 1 * > P(T_slice*HZ) > (P() stand for probability here). > > Now after this patch, the scenari

Re: [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy

2016-02-09 Thread Rik van Riel
On Tue, 2016-02-09 at 18:11 +0100, Frederic Weisbecker wrote: > On Tue, Feb 02, 2016 at 12:19:46PM -0500, r...@redhat.com wrote: > > From: Rik van Riel > > > > After removing __acct_update_integrals from the profile, > > native_sched_clock remains as the top CPU user.

[tip:sched/core] sched/numa: Spread memory according to CPU and memory use

2016-02-09 Thread tip-bot for Rik van Riel
Commit-ID: 4142c3ebb685bb338b7d96090d8f90ff49065ff6 Gitweb: http://git.kernel.org/tip/4142c3ebb685bb338b7d96090d8f90ff49065ff6 Author: Rik van Riel AuthorDate: Mon, 25 Jan 2016 17:07:39 -0500 Committer: Ingo Molnar CommitDate: Tue, 9 Feb 2016 14:47:18 +0100 sched/numa: Spread memory

Re: [PATCH 0/4 v5] sched,time: reduce nohz_full syscall overhead 40%

2016-02-08 Thread Rik van Riel
On 02/02/2016 12:19 PM, r...@redhat.com wrote: > (v5: address comments by Frederic & Peter, fix bug found by Eric) > > Running with nohz_full introduces a fair amount of overhead. > Specifically, various things that are usually done from the > timer interrupt are now done at syscall, irq, and gues

Re: [PATCH] genirq: Add default affinity mask command line option

2016-02-03 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 02/03/2016 01:52 PM, Thomas Gleixner wrote: > From: Thomas Gleixner Date: Fri May 25 > 16:59:47 2012 +0200 Subject: genirq: Add default affinity mask > command line option > > If we isolate CPUs, then we don't want random device interrupts on >

Re: [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy

2016-02-01 Thread Rik van Riel
On 02/01/2016 03:00 PM, Eric Dumazet wrote: > On Mon, 2016-02-01 at 14:21 -0500, r...@redhat.com wrote: >> From: Rik van Riel >> > >> #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN >> +static bool vtime_jiffies_changed(struct task_struct *tsk, unsigned long >> now)

Re: [PATCH 4/4] sched,time: only call account_{user,sys,guest,idle}_time once a jiffy

2016-02-01 Thread Rik van Riel
On 02/01/2016 04:29 AM, Peter Zijlstra wrote: > On Sun, Jan 31, 2016 at 09:12:31PM -0500, r...@redhat.com wrote: >> Run times for the microbenchmark: >> >> 4.4 3.8 seconds >> 4.5-rc1 3.7 seconds >> 4.5-rc1 + first patch3.3 second

Re: [PATCH 3/4] time,acct: drop irq save & restore from __acct_update_integrals

2016-02-01 Thread Rik van Riel
On 02/01/2016 04:28 AM, Peter Zijlstra wrote: > Its just the acct_account_cputime() callers that I suspect will all have > IRQs disabled, but it would still be goot to verify that with a > WARN_ON(!irqs_disabled()) test in there for at least one test run, and > then include that you did this in th

Re: [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals

2016-02-01 Thread Rik van Riel
On 02/01/2016 04:22 AM, Peter Zijlstra wrote: > On Mon, Feb 01, 2016 at 09:37:00AM +0100, Thomas Gleixner wrote: >> On Sun, 31 Jan 2016, r...@redhat.com wrote: >>> @@ -93,9 +93,9 @@ void xacct_add_tsk(struct taskstats *stats, struct >>> task_struct *p) >>> { >>> struct mm_struct *mm; >>> >>

Re: [PATCH 1/4] sched,time: remove non-power-of-two divides from __acct_update_integrals

2016-01-30 Thread Rik van Riel
On 01/30/2016 09:44 AM, Frederic Weisbecker wrote: > On Fri, Jan 29, 2016 at 10:36:02PM -0500, r...@redhat.com wrote: >> From: Rik van Riel >> >> When running a microbenchmark calling an invalid syscall number >> in a loop, on a nohz_full CPU, we spend a fu

Re: [PATCH 1/2] sched,time: remove pointless divides from __acct_update_integrals

2016-01-29 Thread Rik van Riel
On 01/29/2016 10:36 PM, Frederic Weisbecker wrote: > On Sat, Jan 30, 2016 at 12:10:18AM +0100, Peter Zijlstra wrote: >> On Fri, Jan 29, 2016 at 05:22:59PM -0500, r...@redhat.com wrote: >>> From: Rik van Riel >>> >>> When running a microbenchmark calling an inva

Re: [PATCH 2/2] sched,time: call __acct_update_integrals once a jiffy

2016-01-29 Thread Rik van Riel
On 01/29/2016 05:23 PM, r...@redhat.com wrote: > From: Rik van Riel > This speeds up ... ok, that changelog got truncated :( Here is the full version: Because __acct_update_integrals does nothing unless the time interval in question exceeds a jiffy, there is no real reason to call i

Re: computing drop-able caches

2016-01-29 Thread Rik van Riel
On 01/28/2016 08:55 PM, Johannes Weiner wrote: > On Thu, Jan 28, 2016 at 05:29:41PM -0800, Daniel Walker wrote: >> On 01/28/2016 05:03 PM, Daniel Walker wrote: >> [regarding MemAvaiable] >> >> This new metric purportedly helps usrespace assess available memory. But, >> its again based on heuristic,

[PATCH] sched,numa,mm: spread memory according to CPU and memory use

2016-01-25 Thread Rik van Riel
e seems to result in fairer distribution of memory between nodes, with more memory bandwidth for each instance. Signed-off-by: Rik van Riel --- kernel/sched/fair.c | 86 + 1 file changed, 47 insertions(+), 39 deletions(-) diff --git a/kernel/sc

Re: [LSF/MM TOPIC] VM containers

2016-01-25 Thread Rik van Riel
On 01/24/2016 12:06 PM, One Thousand Gnomes wrote: >>> That changes some of the goals the memory management subsystem has, >>> from "use all the resources effectively" to "use as few resources as >>> necessary, in case the host needs the memory for something else". > > Also "and take guidance/prov

[LSF/MM TOPIC] VM containers

2016-01-22 Thread Rik van Riel
Hi, I am trying to gauge interest in discussing VM containers at the LSF/MM summit this year. Projects like ClearLinux, Qubes, and others are all trying to use virtual machines as better isolated containers. That changes some of the goals the memory management subsystem has, from "use all the res

Re: [PATCH] mm: mempolicy: skip non-migratable VMAs when setting MPOL_MF_LAZY

2016-01-06 Thread Rik van Riel
p non-migratable > VMAs. The changelog could use a better description of exactly what the issue is, and why calling change_prot_numa on a non-migratable VMA is causing problems. > Signed-off-by: Liang Chen > Signed-off-by: Gavin Guo For the code itself: Acked-by: Rik van Riel P

Re: new warning on sysrq kernel crash trigger

2015-12-16 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 12/15/2015 07:52 PM, Ani Sinha wrote: > Rik, should I send a separate email with the patch or you are OK > with what I sent in the email? Are you queueing up my patch for > applying upstream? I don't have a git tree for people to pull from, and i

Re: new warning on sysrq kernel crash trigger

2015-12-16 Thread Rik van Riel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 12/14/2015 07:14 PM, Anirban Sinha wrote: > > > On Mon, 14 Dec 2015, Rik van Riel wrote: > >> On 12/14/2015 11:24 AM, Ani Sinha wrote: >>> Rik, any comments? >> >> Another good option is to

Re: sched : performance regression 24% between 4.4rc4 and 4.3 kernel

2015-12-14 Thread Rik van Riel
On 12/14/2015 06:52 PM, Jirka Hladky wrote: > Hi all, > > I have the results of bisecting: > > first bad commit: [973759c80db96ed4b4c5cb85ac7d48107f801371] Merge tag > 'v4.3-rc1' into sched/core, to refresh the branch > > Could you please have a look at this commit why it has caused the > perfor

Re: new warning on sysrq kernel crash trigger

2015-12-14 Thread Rik van Riel
On 12/14/2015 11:24 AM, Ani Sinha wrote: > Rik, any comments? Another good option is to simply ignore this warning, or drop the rcu_read_lock before doing the alt-syrsq-c action. After all, alt-sysrq-c is "crash the system, take a crash dump", which is not an action the system ever returns from.

Re: new warning on sysrq kernel crash trigger

2015-12-11 Thread Rik van Riel
>>> [ 978.987358] Preemption disabled at:[] printk+0x48/0x4a >>> >>> >>> I have bisected this to the following change : >>> >>> commit 984d74a72076a12b400339973e8c98fd2fcd90e5 >>> Author: Rik van Riel >>> Date: Fr

Re: [PATCH] sched: remove false-positive warning from wake_up_process()

2015-11-30 Thread Rik van Riel
On 11/30/2015 08:47 PM, Linus Torvalds wrote: > On Mon, Nov 30, 2015 at 5:34 PM, Sasha Levin wrote: >> Futex can have a spurious wake up before we actually wake it up on our own, >> which will trigger this warning if the task is still stopped. > > Actually, I think it would presumably be the othe

Re: [PATCH] x86_64: enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN

2015-11-30 Thread Rik van Riel
disable mem hotplug) > can disable memory hotplug with 'acpi_no_memhotplug = 1' > to avoid automatic SWIOTLB initialization. > > Tested on QEMU/KVM and HyperV. > > Signed-off-by: Igor Mammedov Acked-by: Rik van Riel -- All rights reversed -- To unsubscribe from this li

Re: [lkp] [mm, page_alloc] d0164adc89: -100.0% fsmark.app_overhead

2015-11-26 Thread Rik van Riel
On 11/26/2015 08:25 AM, Mel Gorman wrote: On Thu, Nov 26, 2015 at 08:56:12AM +0800, kernel test robot wrote: FYI, we noticed the below changes on https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit d0164adc89f6bb374d304ffcc375c6d2652fe67d ("mm, page_alloc: distingui

[tip:sched/core] sched/numa: Cap PTE scanning overhead to 3% of run time

2015-11-23 Thread tip-bot for Rik van Riel
Commit-ID: 51170840fe91dfca10fd533b303ea39b2524782a Gitweb: http://git.kernel.org/tip/51170840fe91dfca10fd533b303ea39b2524782a Author: Rik van Riel AuthorDate: Thu, 5 Nov 2015 15:56:23 -0500 Committer: Ingo Molnar CommitDate: Mon, 23 Nov 2015 09:37:54 +0100 sched/numa: Cap PTE

Re: crazy idea: big percpu lock (Re: task isolation)

2015-11-10 Thread Rik van Riel
On 10/28/2015 02:45 PM, Andy Lutomirski wrote: >> The model I chose is to have a per-cpu state that indicates whether >> the core is in kernel space, in user space, or in user space with >> a TLB flush pending. On entry to user space with task isolation >> in effect we just set the state to "user

[tip:sched/urgent] sched/numa: Fix math underflow in task_tick_numa()

2015-11-09 Thread tip-bot for Rik van Riel
Commit-ID: 25b3e5a3344e1f700c1efec5b6f0199f04707fb1 Gitweb: http://git.kernel.org/tip/25b3e5a3344e1f700c1efec5b6f0199f04707fb1 Author: Rik van Riel AuthorDate: Thu, 5 Nov 2015 15:56:22 -0500 Committer: Ingo Molnar CommitDate: Mon, 9 Nov 2015 16:13:27 +0100 sched/numa: Fix math

Re: [PATCH 2/3] context_tracking: avoid irq_save/irq_restore on guest entry and exit

2015-11-09 Thread Rik van Riel
ext tracking functions are > called by guest_enter and guest_exit. > > Split the body of context_tracking_entry and context_tracking_exit > out to __-prefixed functions, and use them from KVM. > > Rik van Riel has measured this to speed up a tight vmentry/vmexit > loop by a

Re: [PATCH 1/3] context_tracking: remove duplicate enabled check

2015-11-09 Thread Rik van Riel
s. > > Pull the check up to those functions, by making them simple > wrappers around the user_enter and user_exit inline functions. > > Cc: Andy Lutomirski > Cc: Frederic Weisbecker > Cc: Rik van Riel > Cc: Paul McKenney > Signed-off-by: Paolo Bonzini Reviewed-by

Re: [PATCH 0/3] cpuidle: small improvements & fixes for menu governor (resend)

2015-11-05 Thread Rik van Riel
On 11/05/2015 05:34 PM, Rafael J. Wysocki wrote: > On Tuesday, November 03, 2015 05:34:16 PM r...@redhat.com wrote: >> While working on a paravirt cpuidle driver for KVM guests, I >> noticed a number of small logic errors in the menu governor >> code. >> >> These patches should get rid of some arti

<    4   5   6   7   8   9   10   11   12   13   >