Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-24 Thread Aaron Lu
On Thu, Mar 23, 2017 at 08:38:43AM -0700, Dave Hansen wrote:
> On 03/22/2017 01:41 AM, Aaron Lu wrote:
> > On Wed, Mar 22, 2017 at 03:33:35PM +0900, Minchan Kim wrote:
> >> On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
> >>> Introduce a workqueue for all the free workers so that user can fine
> >>> tune how many workers can be active through sysfs interface: max_active.
> >>> More workers will normally lead to better performance, but too many can
> >>> cause severe lock contention.
> >>
> >> Let me ask a question.
> >>
> >> How well can workqueue distribute the jobs in multiple CPU?
> > 
> > I would say it's good enough for my needs.
> > After all, it doesn't need many kworkers to achieve the 50% time
> > decrease: 2-4 kworkers for EP and 4-8 kworkers for EX are enough from
> > previous attched data.
> 
> It's also worth noting that we'd like to *also* like to look into
> increasing how scalable freeing pages to a given zone is.

Still on EX, I restricted the allocation to be only on node 1, with
120G memory allocated there:

max_activetimecompared to base  lock from perf
base(no parallel) 3.81s ±3.3% N/A   <1%
1 3.10s ±7.7% ↓18.6%14.76%
2 2.44s ±13.6%↓35.9%36.95%
4 2.07s ±13.6%↓45.6%59.67%
8 1.98s ±0.4% ↓48.0%62.59%
162.01s ±2.4% ↓47.2%79.62%

If we can improve the scalibility of freeing a given zone, then parallel
free will be able to achieve more.

BTW, the lock is basically pgdat->lru_lock in release_pages and
zone->lock in free_pcppages_bulk:
62.59%62.59%  [kernel.kallsyms]  [k] native_queued_spin_lock_slowpath
37.17% 
native_queued_spin_lock_slowpath;_raw_spin_lock_irqsave;free_pcppages_bulk;free_hot_cold_page;free_hot_cold_page_list;release_pages;free_pages_and_swap_cache;tlb_flush_mmu_free_batches;batch_free_work;process_one_work;worker_thread;kthread;ret_from_fork
25.27% 
native_queued_spin_lock_slowpath;_raw_spin_lock_irqsave;release_pages;free_pages_and_swap_cache;tlb_flush_mmu_free_batches;batch_free_work;process_one_work;worker_thread;kthread;ret_from_fork


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-24 Thread Aaron Lu
On Thu, Mar 23, 2017 at 08:38:43AM -0700, Dave Hansen wrote:
> On 03/22/2017 01:41 AM, Aaron Lu wrote:
> > On Wed, Mar 22, 2017 at 03:33:35PM +0900, Minchan Kim wrote:
> >> On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
> >>> Introduce a workqueue for all the free workers so that user can fine
> >>> tune how many workers can be active through sysfs interface: max_active.
> >>> More workers will normally lead to better performance, but too many can
> >>> cause severe lock contention.
> >>
> >> Let me ask a question.
> >>
> >> How well can workqueue distribute the jobs in multiple CPU?
> > 
> > I would say it's good enough for my needs.
> > After all, it doesn't need many kworkers to achieve the 50% time
> > decrease: 2-4 kworkers for EP and 4-8 kworkers for EX are enough from
> > previous attched data.
> 
> It's also worth noting that we'd like to *also* like to look into
> increasing how scalable freeing pages to a given zone is.

Still on EX, I restricted the allocation to be only on node 1, with
120G memory allocated there:

max_activetimecompared to base  lock from perf
base(no parallel) 3.81s ±3.3% N/A   <1%
1 3.10s ±7.7% ↓18.6%14.76%
2 2.44s ±13.6%↓35.9%36.95%
4 2.07s ±13.6%↓45.6%59.67%
8 1.98s ±0.4% ↓48.0%62.59%
162.01s ±2.4% ↓47.2%79.62%

If we can improve the scalibility of freeing a given zone, then parallel
free will be able to achieve more.

BTW, the lock is basically pgdat->lru_lock in release_pages and
zone->lock in free_pcppages_bulk:
62.59%62.59%  [kernel.kallsyms]  [k] native_queued_spin_lock_slowpath
37.17% 
native_queued_spin_lock_slowpath;_raw_spin_lock_irqsave;free_pcppages_bulk;free_hot_cold_page;free_hot_cold_page_list;release_pages;free_pages_and_swap_cache;tlb_flush_mmu_free_batches;batch_free_work;process_one_work;worker_thread;kthread;ret_from_fork
25.27% 
native_queued_spin_lock_slowpath;_raw_spin_lock_irqsave;release_pages;free_pages_and_swap_cache;tlb_flush_mmu_free_batches;batch_free_work;process_one_work;worker_thread;kthread;ret_from_fork


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-23 Thread Dave Hansen
On 03/22/2017 01:41 AM, Aaron Lu wrote:
> On Wed, Mar 22, 2017 at 03:33:35PM +0900, Minchan Kim wrote:
>> On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
>>> Introduce a workqueue for all the free workers so that user can fine
>>> tune how many workers can be active through sysfs interface: max_active.
>>> More workers will normally lead to better performance, but too many can
>>> cause severe lock contention.
>>
>> Let me ask a question.
>>
>> How well can workqueue distribute the jobs in multiple CPU?
> 
> I would say it's good enough for my needs.
> After all, it doesn't need many kworkers to achieve the 50% time
> decrease: 2-4 kworkers for EP and 4-8 kworkers for EX are enough from
> previous attched data.

It's also worth noting that we'd like to *also* like to look into
increasing how scalable freeing pages to a given zone is.


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-23 Thread Dave Hansen
On 03/22/2017 01:41 AM, Aaron Lu wrote:
> On Wed, Mar 22, 2017 at 03:33:35PM +0900, Minchan Kim wrote:
>> On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
>>> Introduce a workqueue for all the free workers so that user can fine
>>> tune how many workers can be active through sysfs interface: max_active.
>>> More workers will normally lead to better performance, but too many can
>>> cause severe lock contention.
>>
>> Let me ask a question.
>>
>> How well can workqueue distribute the jobs in multiple CPU?
> 
> I would say it's good enough for my needs.
> After all, it doesn't need many kworkers to achieve the 50% time
> decrease: 2-4 kworkers for EP and 4-8 kworkers for EX are enough from
> previous attched data.

It's also worth noting that we'd like to *also* like to look into
increasing how scalable freeing pages to a given zone is.


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-23 Thread Minchan Kim
On Wed, Mar 22, 2017 at 09:43:04PM +0800, Aaron Lu wrote:
> On Wed, Mar 22, 2017 at 05:55:12PM +0900, Minchan Kim wrote:
> > On Wed, Mar 22, 2017 at 04:41:04PM +0800, Aaron Lu wrote:
> > > My understanding of the unbound workqueue is that it will create a
> > > thread pool for each node, versus each CPU as in the bound workqueue
> > > case, and use threads from the thread pool(create threads if not enough)
> > > to do the work.
> > 
> > Yes, that was my understand so I read code and found that
> > 
> > insert_work:
> > ..
> > if (__need_more_worker(pool))
> > wake_up_worker(pool); 
> > 
> > so I thought if there is a running thread in that node, workqueue
> > will not wake any other threads so parallelism should be max 2.
> > AFAIK, if the work goes sleep, scheduler will spawn new worker
> > thread so the active worker could be a lot but I cannot see any
> > significant sleepable point in that work(ie, batch_free_work).
> 
> Looks like worker_thread() will spawn new worker through manage_worker().
> 
> Note that pool->nr_running will always be zero for an unbound workqueue
> and thus need_more_worker() will return true as long as there are queued
> work items in the pool.

Aha, it solves my wonder. Thanks a lot!


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-23 Thread Minchan Kim
On Wed, Mar 22, 2017 at 09:43:04PM +0800, Aaron Lu wrote:
> On Wed, Mar 22, 2017 at 05:55:12PM +0900, Minchan Kim wrote:
> > On Wed, Mar 22, 2017 at 04:41:04PM +0800, Aaron Lu wrote:
> > > My understanding of the unbound workqueue is that it will create a
> > > thread pool for each node, versus each CPU as in the bound workqueue
> > > case, and use threads from the thread pool(create threads if not enough)
> > > to do the work.
> > 
> > Yes, that was my understand so I read code and found that
> > 
> > insert_work:
> > ..
> > if (__need_more_worker(pool))
> > wake_up_worker(pool); 
> > 
> > so I thought if there is a running thread in that node, workqueue
> > will not wake any other threads so parallelism should be max 2.
> > AFAIK, if the work goes sleep, scheduler will spawn new worker
> > thread so the active worker could be a lot but I cannot see any
> > significant sleepable point in that work(ie, batch_free_work).
> 
> Looks like worker_thread() will spawn new worker through manage_worker().
> 
> Note that pool->nr_running will always be zero for an unbound workqueue
> and thus need_more_worker() will return true as long as there are queued
> work items in the pool.

Aha, it solves my wonder. Thanks a lot!


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-22 Thread Aaron Lu
On Wed, Mar 22, 2017 at 05:55:12PM +0900, Minchan Kim wrote:
> On Wed, Mar 22, 2017 at 04:41:04PM +0800, Aaron Lu wrote:
> > My understanding of the unbound workqueue is that it will create a
> > thread pool for each node, versus each CPU as in the bound workqueue
> > case, and use threads from the thread pool(create threads if not enough)
> > to do the work.
> 
> Yes, that was my understand so I read code and found that
> 
> insert_work:
> ..
> if (__need_more_worker(pool))
> wake_up_worker(pool); 
> 
> so I thought if there is a running thread in that node, workqueue
> will not wake any other threads so parallelism should be max 2.
> AFAIK, if the work goes sleep, scheduler will spawn new worker
> thread so the active worker could be a lot but I cannot see any
> significant sleepable point in that work(ie, batch_free_work).

Looks like worker_thread() will spawn new worker through manage_worker().

Note that pool->nr_running will always be zero for an unbound workqueue
and thus need_more_worker() will return true as long as there are queued
work items in the pool.

Thanks,
Aaron


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-22 Thread Aaron Lu
On Wed, Mar 22, 2017 at 05:55:12PM +0900, Minchan Kim wrote:
> On Wed, Mar 22, 2017 at 04:41:04PM +0800, Aaron Lu wrote:
> > My understanding of the unbound workqueue is that it will create a
> > thread pool for each node, versus each CPU as in the bound workqueue
> > case, and use threads from the thread pool(create threads if not enough)
> > to do the work.
> 
> Yes, that was my understand so I read code and found that
> 
> insert_work:
> ..
> if (__need_more_worker(pool))
> wake_up_worker(pool); 
> 
> so I thought if there is a running thread in that node, workqueue
> will not wake any other threads so parallelism should be max 2.
> AFAIK, if the work goes sleep, scheduler will spawn new worker
> thread so the active worker could be a lot but I cannot see any
> significant sleepable point in that work(ie, batch_free_work).

Looks like worker_thread() will spawn new worker through manage_worker().

Note that pool->nr_running will always be zero for an unbound workqueue
and thus need_more_worker() will return true as long as there are queued
work items in the pool.

Thanks,
Aaron


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-22 Thread Minchan Kim
On Wed, Mar 22, 2017 at 04:41:04PM +0800, Aaron Lu wrote:
> On Wed, Mar 22, 2017 at 03:33:35PM +0900, Minchan Kim wrote:
> > Hi,
> > 
> > On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
> > > Introduce a workqueue for all the free workers so that user can fine
> > > tune how many workers can be active through sysfs interface: max_active.
> > > More workers will normally lead to better performance, but too many can
> > > cause severe lock contention.
> > 
> > Let me ask a question.
> > 
> > How well can workqueue distribute the jobs in multiple CPU?
> 
> I would say it's good enough for my needs.
> After all, it doesn't need many kworkers to achieve the 50% time
> decrease: 2-4 kworkers for EP and 4-8 kworkers for EX are enough from
> previous attched data.
> 
> > I don't ask about currency but parallelism.
> > I guess benefit you are seeing comes from the parallelism and
> > for your goal, unbound wq should spawn a thread per cpu and
> > doing the work in every each CPU. does it work?
> 
> I don't think a unbound workqueue will spawn a thread per CPU, that
> seems too much a cost to have a unbound workqueue.
> 
> My understanding of the unbound workqueue is that it will create a
> thread pool for each node, versus each CPU as in the bound workqueue
> case, and use threads from the thread pool(create threads if not enough)
> to do the work.

Yes, that was my understand so I read code and found that

insert_work:
..
if (__need_more_worker(pool))
wake_up_worker(pool); 

so I thought if there is a running thread in that node, workqueue
will not wake any other threads so parallelism should be max 2.
AFAIK, if the work goes sleep, scheduler will spawn new worker
thread so the active worker could be a lot but I cannot see any
significant sleepable point in that work(ie, batch_free_work).

> 
> I guess you want to ask if the unbound workqueue can spawn enough
> threads to do the job? From the output of 'vmstat 1' during the free()
> test, I can see some 70+ processes in runnable state when I didn't
> set an upper limit for max_active of the workqueue.

Hmm, it seems I was wrong. I am really curious how we can have
70+ active workers in that. Could you explain what I am missing?

Thanks.

> 
> Thanks,
> Aaron
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-22 Thread Minchan Kim
On Wed, Mar 22, 2017 at 04:41:04PM +0800, Aaron Lu wrote:
> On Wed, Mar 22, 2017 at 03:33:35PM +0900, Minchan Kim wrote:
> > Hi,
> > 
> > On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
> > > Introduce a workqueue for all the free workers so that user can fine
> > > tune how many workers can be active through sysfs interface: max_active.
> > > More workers will normally lead to better performance, but too many can
> > > cause severe lock contention.
> > 
> > Let me ask a question.
> > 
> > How well can workqueue distribute the jobs in multiple CPU?
> 
> I would say it's good enough for my needs.
> After all, it doesn't need many kworkers to achieve the 50% time
> decrease: 2-4 kworkers for EP and 4-8 kworkers for EX are enough from
> previous attched data.
> 
> > I don't ask about currency but parallelism.
> > I guess benefit you are seeing comes from the parallelism and
> > for your goal, unbound wq should spawn a thread per cpu and
> > doing the work in every each CPU. does it work?
> 
> I don't think a unbound workqueue will spawn a thread per CPU, that
> seems too much a cost to have a unbound workqueue.
> 
> My understanding of the unbound workqueue is that it will create a
> thread pool for each node, versus each CPU as in the bound workqueue
> case, and use threads from the thread pool(create threads if not enough)
> to do the work.

Yes, that was my understand so I read code and found that

insert_work:
..
if (__need_more_worker(pool))
wake_up_worker(pool); 

so I thought if there is a running thread in that node, workqueue
will not wake any other threads so parallelism should be max 2.
AFAIK, if the work goes sleep, scheduler will spawn new worker
thread so the active worker could be a lot but I cannot see any
significant sleepable point in that work(ie, batch_free_work).

> 
> I guess you want to ask if the unbound workqueue can spawn enough
> threads to do the job? From the output of 'vmstat 1' during the free()
> test, I can see some 70+ processes in runnable state when I didn't
> set an upper limit for max_active of the workqueue.

Hmm, it seems I was wrong. I am really curious how we can have
70+ active workers in that. Could you explain what I am missing?

Thanks.

> 
> Thanks,
> Aaron
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-22 Thread Aaron Lu
On Wed, Mar 22, 2017 at 03:33:35PM +0900, Minchan Kim wrote:
> Hi,
> 
> On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
> > Introduce a workqueue for all the free workers so that user can fine
> > tune how many workers can be active through sysfs interface: max_active.
> > More workers will normally lead to better performance, but too many can
> > cause severe lock contention.
> 
> Let me ask a question.
> 
> How well can workqueue distribute the jobs in multiple CPU?

I would say it's good enough for my needs.
After all, it doesn't need many kworkers to achieve the 50% time
decrease: 2-4 kworkers for EP and 4-8 kworkers for EX are enough from
previous attched data.

> I don't ask about currency but parallelism.
> I guess benefit you are seeing comes from the parallelism and
> for your goal, unbound wq should spawn a thread per cpu and
> doing the work in every each CPU. does it work?

I don't think a unbound workqueue will spawn a thread per CPU, that
seems too much a cost to have a unbound workqueue.

My understanding of the unbound workqueue is that it will create a
thread pool for each node, versus each CPU as in the bound workqueue
case, and use threads from the thread pool(create threads if not enough)
to do the work.

I guess you want to ask if the unbound workqueue can spawn enough
threads to do the job? From the output of 'vmstat 1' during the free()
test, I can see some 70+ processes in runnable state when I didn't
set an upper limit for max_active of the workqueue.

Thanks,
Aaron


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-22 Thread Aaron Lu
On Wed, Mar 22, 2017 at 03:33:35PM +0900, Minchan Kim wrote:
> Hi,
> 
> On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
> > Introduce a workqueue for all the free workers so that user can fine
> > tune how many workers can be active through sysfs interface: max_active.
> > More workers will normally lead to better performance, but too many can
> > cause severe lock contention.
> 
> Let me ask a question.
> 
> How well can workqueue distribute the jobs in multiple CPU?

I would say it's good enough for my needs.
After all, it doesn't need many kworkers to achieve the 50% time
decrease: 2-4 kworkers for EP and 4-8 kworkers for EX are enough from
previous attched data.

> I don't ask about currency but parallelism.
> I guess benefit you are seeing comes from the parallelism and
> for your goal, unbound wq should spawn a thread per cpu and
> doing the work in every each CPU. does it work?

I don't think a unbound workqueue will spawn a thread per CPU, that
seems too much a cost to have a unbound workqueue.

My understanding of the unbound workqueue is that it will create a
thread pool for each node, versus each CPU as in the bound workqueue
case, and use threads from the thread pool(create threads if not enough)
to do the work.

I guess you want to ask if the unbound workqueue can spawn enough
threads to do the job? From the output of 'vmstat 1' during the free()
test, I can see some 70+ processes in runnable state when I didn't
set an upper limit for max_active of the workqueue.

Thanks,
Aaron


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-22 Thread Minchan Kim
Hi,

On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
> Introduce a workqueue for all the free workers so that user can fine
> tune how many workers can be active through sysfs interface: max_active.
> More workers will normally lead to better performance, but too many can
> cause severe lock contention.

Let me ask a question.

How well can workqueue distribute the jobs in multiple CPU?
I don't ask about currency but parallelism.
I guess benefit you are seeing comes from the parallelism and
for your goal, unbound wq should spawn a thread per cpu and
doing the work in every each CPU. does it work?

> 
> Note that since the zone lock is global, the workqueue is also global
> for all processes, i.e. if we set 8 to max_active, we will have at most
> 8 workers active for all processes that are doing munmap()/exit()/etc.
> 
> Signed-off-by: Aaron Lu 
> ---
>  mm/memory.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 001c7720d773..19b25bb5f45b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -253,6 +253,19 @@ static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
>   __tlb_reset_range(tlb);
>  }
>  
> +static struct workqueue_struct *batch_free_wq;
> +static int __init batch_free_wq_init(void)
> +{
> + batch_free_wq = alloc_workqueue("batch_free_wq",
> + WQ_UNBOUND | WQ_SYSFS, 0);
> + if (!batch_free_wq) {
> + pr_warn("failed to create workqueue batch_free_wq\n");
> + return -ENOMEM;
> + }
> + return 0;
> +}
> +subsys_initcall(batch_free_wq_init);
> +
>  static void tlb_flush_mmu_free_batches(struct mmu_gather_batch *batch_start,
>  bool free_batch_page)
>  {
> @@ -306,7 +319,7 @@ static void tlb_flush_mmu_free(struct mmu_gather *tlb)
>   batch_free->batch_start = tlb->local.next;
>   INIT_WORK(_free->work, batch_free_work);
>   list_add_tail(_free->list, >worker_list);
> - queue_work(system_unbound_wq, _free->work);
> + queue_work(batch_free_wq, _free->work);
>  
>   tlb->batch_count = 0;
>   tlb->local.next = NULL;
> -- 
> 2.7.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH v2 3/5] mm: use a dedicated workqueue for the free workers

2017-03-22 Thread Minchan Kim
Hi,

On Wed, Mar 15, 2017 at 05:00:02PM +0800, Aaron Lu wrote:
> Introduce a workqueue for all the free workers so that user can fine
> tune how many workers can be active through sysfs interface: max_active.
> More workers will normally lead to better performance, but too many can
> cause severe lock contention.

Let me ask a question.

How well can workqueue distribute the jobs in multiple CPU?
I don't ask about currency but parallelism.
I guess benefit you are seeing comes from the parallelism and
for your goal, unbound wq should spawn a thread per cpu and
doing the work in every each CPU. does it work?

> 
> Note that since the zone lock is global, the workqueue is also global
> for all processes, i.e. if we set 8 to max_active, we will have at most
> 8 workers active for all processes that are doing munmap()/exit()/etc.
> 
> Signed-off-by: Aaron Lu 
> ---
>  mm/memory.c | 15 ++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 001c7720d773..19b25bb5f45b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -253,6 +253,19 @@ static void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb)
>   __tlb_reset_range(tlb);
>  }
>  
> +static struct workqueue_struct *batch_free_wq;
> +static int __init batch_free_wq_init(void)
> +{
> + batch_free_wq = alloc_workqueue("batch_free_wq",
> + WQ_UNBOUND | WQ_SYSFS, 0);
> + if (!batch_free_wq) {
> + pr_warn("failed to create workqueue batch_free_wq\n");
> + return -ENOMEM;
> + }
> + return 0;
> +}
> +subsys_initcall(batch_free_wq_init);
> +
>  static void tlb_flush_mmu_free_batches(struct mmu_gather_batch *batch_start,
>  bool free_batch_page)
>  {
> @@ -306,7 +319,7 @@ static void tlb_flush_mmu_free(struct mmu_gather *tlb)
>   batch_free->batch_start = tlb->local.next;
>   INIT_WORK(_free->work, batch_free_work);
>   list_add_tail(_free->list, >worker_list);
> - queue_work(system_unbound_wq, _free->work);
> + queue_work(batch_free_wq, _free->work);
>  
>   tlb->batch_count = 0;
>   tlb->local.next = NULL;
> -- 
> 2.7.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org