On Tue, Dec 03, 2013 at 03:04:00PM +0000, Mel Gorman wrote:
> commit 72403b4a0fbdf433c1fe0127e49864658f6f6468 upstream.

Thank you Mel, I'll queue this backport for the 3.11 kernel.

Cheers,
--
Luis


> 
> Commit 0255d4918480 ("mm: Account for a THP NUMA hinting update as
> one PTE update") was added to account for the number of PTE updates
> when marking pages prot_numa.  task_numa_work was using the old
> return value to track how much address space had been updated.
> Altering the return value causes the scanner to do more work than it
> is configured or documented to in a single unit of work.
> 
> This patch reverts that commit and accounts for the number of THP
> updates separately in vmstat.  It is up to the administrator to
> interpret the pair of values correctly.  This is a straight-forward
> operation and likely to only be of interest when actively debugging NUMA
> balancing problems.
> 
> The impact of this patch is that the NUMA PTE scanner will scan slower
> when THP is enabled and workloads may converge slower as a result.  On
> the flip size system CPU usage should be lower than recent tests
> reported.  This is an illustrative example of a short single JVM specjbb
> test
> 
> specjbb
>                        3.12.0                3.12.0
>                       vanilla      acctupdates
> TPut 1      26143.00 (  0.00%)     25747.00 ( -1.51%)
> TPut 7     185257.00 (  0.00%)    183202.00 ( -1.11%)
> TPut 13    329760.00 (  0.00%)    346577.00 (  5.10%)
> TPut 19    442502.00 (  0.00%)    460146.00 (  3.99%)
> TPut 25    540634.00 (  0.00%)    549053.00 (  1.56%)
> TPut 31    512098.00 (  0.00%)    519611.00 (  1.47%)
> TPut 37    461276.00 (  0.00%)    474973.00 (  2.97%)
> TPut 43    403089.00 (  0.00%)    414172.00 (  2.75%)
> 
>               3.12.0      3.12.0
>              vanillaacctupdates
> User         5169.64     5184.14
> System        100.45       80.02
> Elapsed       252.75      251.85
> 
> Performance is similar but note the reduction in system CPU time.  While
> this showed a performance gain, it will not be universal but at least
> it'll be behaving as documented.  The vmstats are obviously different but
> here is an obvious interpretation of them from mmtests.
> 
>                                 3.12.0      3.12.0
>                                vanillaacctupdates
> NUMA page range updates        1408326    11043064
> NUMA huge PMD updates                0       21040
> NUMA PTE updates               1408326      291624
> 
> "NUMA page range updates" == nr_pte_updates and is the value returned to
> the NUMA pte scanner.  NUMA huge PMD updates were the number of THP
> updates which in combination can be used to calculate how many ptes were
> updated from userspace.
> 
> Signed-off-by: Mel Gorman <mgor...@suse.de>
> Reported-by: Alex Thorlton <athorl...@sgi.com>
> Reviewed-by: Rik van Riel <r...@redhat.com>
> Signed-off-by: Andrew Morton <a...@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torva...@linux-foundation.org>
> Signed-off-by: Mel Gorman <mgor...@suse.de>
> ---
>  include/linux/vm_event_item.h | 1 +
>  mm/mprotect.c                 | 7 ++++++-
>  mm/vmstat.c                   | 1 +
>  3 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 1855f0a..c557c6d 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -39,6 +39,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>               PAGEOUTRUN, ALLOCSTALL, PGROTATED,
>  #ifdef CONFIG_NUMA_BALANCING
>               NUMA_PTE_UPDATES,
> +             NUMA_HUGE_PTE_UPDATES,
>               NUMA_HINT_FAULTS,
>               NUMA_HINT_FAULTS_LOCAL,
>               NUMA_PAGE_MIGRATE,
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 412ba2b..6c3f56f 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -138,6 +138,7 @@ static inline unsigned long change_pmd_range(struct 
> vm_area_struct *vma,
>       pmd_t *pmd;
>       unsigned long next;
>       unsigned long pages = 0;
> +     unsigned long nr_huge_updates = 0;
>       bool all_same_node;
>  
>       pmd = pmd_offset(pud, addr);
> @@ -148,7 +149,8 @@ static inline unsigned long change_pmd_range(struct 
> vm_area_struct *vma,
>                               split_huge_page_pmd(vma, addr, pmd);
>                       else if (change_huge_pmd(vma, pmd, addr, newprot,
>                                                prot_numa)) {
> -                             pages++;
> +                             pages += HPAGE_PMD_NR;
> +                             nr_huge_updates++;
>                               continue;
>                       }
>                       /* fall through */
> @@ -168,6 +170,9 @@ static inline unsigned long change_pmd_range(struct 
> vm_area_struct *vma,
>                       change_pmd_protnuma(vma->vm_mm, addr, pmd);
>       } while (pmd++, addr = next, addr != end);
>  
> +     if (nr_huge_updates)
> +             count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates);
> +
>       return pages;
>  }
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 9bb3145..5a442a7 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -812,6 +812,7 @@ const char * const vmstat_text[] = {
>  
>  #ifdef CONFIG_NUMA_BALANCING
>       "numa_pte_updates",
> +     "numa_huge_pte_updates",
>       "numa_hint_faults",
>       "numa_hint_faults_local",
>       "numa_pages_migrated",
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to