linux-next: manual merge of the akpm-current tree with the tip tree

2021-03-22 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/x86/mm/init_64.c

between commit:

  d9f6e12fb0b7 ("x86: Fix various typos in comments")

from the tip tree and commit:

  68f7bf6e7e98 ("x86/vmemmap: drop handling of 4K unaligned vmemmap range")

from the akpm-current tree.

I fixed it up (the latter removed the comments fixed up by the former)
and can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpLNcP8GJ9cI.pgp
Description: OpenPGP digital signature


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2020-12-11 Thread Jason Gunthorpe
On Fri, Dec 11, 2020 at 07:56:54PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   mm/gup.c
> 
> between commit:
> 
>   2a4a06da8a4b ("mm/gup: Provide gup_get_pte() more generic")
> 
> from the tip tree and commit:
> 
>   1eb2fe862a51 ("mm/gup: combine put_compound_head() and unpin_user_page()")
> 
> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Looks OK

Thanks,
Jason
 




linux-next: manual merge of the akpm-current tree with the tip tree

2020-12-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  mm/gup.c

between commit:

  2a4a06da8a4b ("mm/gup: Provide gup_get_pte() more generic")

from the tip tree and commit:

  1eb2fe862a51 ("mm/gup: combine put_compound_head() and unpin_user_page()")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc mm/gup.c
index 44b0c6b89602,b3d852b4a60c..
--- a/mm/gup.c
+++ b/mm/gup.c
@@@ -2062,29 -1977,62 +1977,6 @@@ EXPORT_SYMBOL(get_user_pages_unlocked)
   * This code is based heavily on the PowerPC implementation by Nick Piggin.
   */
  #ifdef CONFIG_HAVE_FAST_GUP
 -#ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
--
- static void put_compound_head(struct page *page, int refs, unsigned int flags)
 -/*
 - * WARNING: only to be used in the get_user_pages_fast() implementation.
 - *
 - * With get_user_pages_fast(), we walk down the pagetables without taking any
 - * locks.  For this we would like to load the pointers atomically, but 
sometimes
 - * that is not possible (e.g. without expensive cmpxchg8b on x86_32 PAE).  
What
 - * we do have is the guarantee that a PTE will only either go from not present
 - * to present, or present to not present or both -- it will not switch to a
 - * completely different present page without a TLB flush in between; something
 - * that we are blocking by holding interrupts off.
 - *
 - * Setting ptes from not present to present goes:
 - *
 - *   ptep->pte_high = h;
 - *   smp_wmb();
 - *   ptep->pte_low = l;
 - *
 - * And present to not present goes:
 - *
 - *   ptep->pte_low = 0;
 - *   smp_wmb();
 - *   ptep->pte_high = 0;
 - *
 - * We must ensure here that the load of pte_low sees 'l' IFF pte_high sees 
'h'.
 - * We load pte_high *after* loading pte_low, which ensures we don't see an 
older
 - * value of pte_high.  *Then* we recheck pte_low, which ensures that we 
haven't
 - * picked up a changed pte high. We might have gotten rubbish values from
 - * pte_low and pte_high, but we are guaranteed that pte_low will not have the
 - * present bit set *unless* it is 'l'. Because get_user_pages_fast() only
 - * operates on present ptes we're safe.
 - */
 -static inline pte_t gup_get_pte(pte_t *ptep)
--{
-   if (flags & FOLL_PIN) {
-   mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_RELEASED,
-   refs);
 -  pte_t pte;
--
-   if (hpage_pincount_available(page))
-   hpage_pincount_sub(page, refs);
-   else
-   refs *= GUP_PIN_COUNTING_BIAS;
-   }
 -  do {
 -  pte.pte_low = ptep->pte_low;
 -  smp_rmb();
 -  pte.pte_high = ptep->pte_high;
 -  smp_rmb();
 -  } while (unlikely(pte.pte_low != ptep->pte_low));
--
-   VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
-   /*
-* Calling put_page() for each ref is unnecessarily slow. Only the last
-* ref needs a put_page().
-*/
-   if (refs > 1)
-   page_ref_sub(page, refs - 1);
-   put_page(page);
 -  return pte;
 -}
 -#else /* CONFIG_GUP_GET_PTE_LOW_HIGH */
 -/*
 - * We require that the PTE can be read atomically.
 - */
 -static inline pte_t gup_get_pte(pte_t *ptep)
 -{
 -  return ptep_get(ptep);
--}
 -#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
--
  static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
unsigned int flags,
struct page **pages)


pgpIyk7IqzE7w.pgp
Description: OpenPGP digital signature


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2020-11-30 Thread Thomas Gleixner
On Fri, Nov 27 2020 at 13:54, Andy Shevchenko wrote:
>> I fixed it up (see below) and can carry the fix as necessary. This
>> is now fixed as far as linux-next is concerned, but any non trivial
>> conflicts should be mentioned to your upstream maintainer when your tree
>> is submitted for merging.  You may also want to consider cooperating
>> with the maintainer of the conflicting tree to minimise any particularly
>> complex conflicts.
>
> Thanks, from my perspective looks good, dunno if scheduler part is okay.

The final outcome in -next looks correct.

Thanks,

tglx


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2020-11-27 Thread Andy Shevchenko
On Fri, Nov 27, 2020 at 06:39:24PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   include/linux/kernel.h
> 
> between commit:
> 
>   74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT")
> 
> from the tip tree and commit:
> 
>   761ace49e56f ("kernel.h: Split out mathematical helpers")
> 
> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Thanks, from my perspective looks good, dunno if scheduler part is okay.

> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc include/linux/kernel.h
> index dbf6018fc312,f97ab3283a8b..
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@@ -272,48 -145,13 +159,6 @@@ extern void __cant_migrate(const char *
>   
>   #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
>   
> - /**
> -  * abs - return absolute value of an argument
> -  * @x: the value.  If it is unsigned type, it is converted to signed type 
> first.
> -  * char is treated as if it was signed (regardless of whether it really 
> is)
> -  * but the macro's return type is preserved as char.
> -  *
> -  * Return: an absolute value of x.
> -  */
> - #define abs(x)  __abs_choose_expr(x, long long, 
> \
> - __abs_choose_expr(x, long,  \
> - __abs_choose_expr(x, int,   \
> - __abs_choose_expr(x, short, \
> - __abs_choose_expr(x, char,  \
> - __builtin_choose_expr(  \
> - __builtin_types_compatible_p(typeof(x), char),  \
> - (char)({ signed char __x = (x); __x<0?-__x:__x; }), \
> - ((void)0)))
> - 
> - #define __abs_choose_expr(x, type, other) __builtin_choose_expr(\
> - __builtin_types_compatible_p(typeof(x),   signed type) ||   \
> - __builtin_types_compatible_p(typeof(x), unsigned type), \
> - ({ signed type __x = (x); __x < 0 ? -__x : __x; }), other)
> - 
> - /**
> -  * reciprocal_scale - "scale" a value into range [0, ep_ro)
> -  * @val: value
> -  * @ep_ro: right open interval endpoint
> -  *
> -  * Perform a "reciprocal multiplication" in order to "scale" a value into
> -  * range [0, @ep_ro), where the upper interval endpoint is right-open.
> -  * This is useful, e.g. for accessing a index of an array containing
> -  * @ep_ro elements, for example. Think of it as sort of modulus, only that
> -  * the result isn't that of modulo. ;) Note that if initial input is a
> -  * small value, then result will return 0.
> -  *
> -  * Return: a result based on @val in interval [0, @ep_ro).
> -  */
> - static inline u32 reciprocal_scale(u32 val, u32 ep_ro)
> - {
> - return (u32)(((u64) val * ep_ro) >> 32);
> - }
>  -#ifndef CONFIG_PREEMPT_RT
>  -# define cant_migrate() cant_sleep()
>  -#else
>  -  /* Placeholder for now */
>  -# define cant_migrate() do { } while (0)
>  -#endif
> --
>   #if defined(CONFIG_MMU) && \
>   (defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP))
>   #define might_fault() __might_fault(__FILE__, __LINE__)



-- 
With Best Regards,
Andy Shevchenko




linux-next: manual merge of the akpm-current tree with the tip tree

2020-11-26 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  mm/highmem.c

between commits:

  298fa1ad5571 ("highmem: Provide generic variant of kmap_atomic*")
  5fbda3ecd14a ("sched: highmem: Store local kmaps in task struct")

from the tip tree and commit:

  72d22a0d0e86 ("mm: support THPs in zero_user_segments")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc mm/highmem.c
index 83f9660f168f,e2da8c9770e9..
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@@ -358,260 -367,68 +358,319 @@@ void kunmap_high(struct page *page
if (need_wakeup)
wake_up(pkmap_map_wait);
  }
 -
  EXPORT_SYMBOL(kunmap_high);
+ 
+ #ifdef CONFIG_TRANSPARENT_HUGEPAGE
+ void zero_user_segments(struct page *page, unsigned start1, unsigned end1,
+   unsigned start2, unsigned end2)
+ {
+   unsigned int i;
+ 
+   BUG_ON(end1 > page_size(page) || end2 > page_size(page));
+ 
+   for (i = 0; i < compound_nr(page); i++) {
+   void *kaddr;
+   unsigned this_end;
+ 
+   if (end1 == 0 && start2 >= PAGE_SIZE) {
+   start2 -= PAGE_SIZE;
+   end2 -= PAGE_SIZE;
+   continue;
+   }
+ 
+   if (start1 >= PAGE_SIZE) {
+   start1 -= PAGE_SIZE;
+   end1 -= PAGE_SIZE;
+   if (start2) {
+   start2 -= PAGE_SIZE;
+   end2 -= PAGE_SIZE;
+   }
+   continue;
+   }
+ 
+   kaddr = kmap_atomic(page + i);
+ 
+   this_end = min_t(unsigned, end1, PAGE_SIZE);
+   if (end1 > start1)
+   memset(kaddr + start1, 0, this_end - start1);
+   end1 -= this_end;
+   start1 = 0;
+ 
+   if (start2 >= PAGE_SIZE) {
+   start2 -= PAGE_SIZE;
+   end2 -= PAGE_SIZE;
+   } else {
+   this_end = min_t(unsigned, end2, PAGE_SIZE);
+   if (end2 > start2)
+   memset(kaddr + start2, 0, this_end - start2);
+   end2 -= this_end;
+   start2 = 0;
+   }
+ 
+   kunmap_atomic(kaddr);
+   flush_dcache_page(page + i);
+ 
+   if (!end1 && !end2)
+   break;
+   }
+ 
+   BUG_ON((start1 | start2 | end1 | end2) != 0);
+ }
+ EXPORT_SYMBOL(zero_user_segments);
+ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 -#endif/* CONFIG_HIGHMEM */
 +#endif /* CONFIG_HIGHMEM */
 +
 +#ifdef CONFIG_KMAP_LOCAL
 +
 +#include 
 +
 +/*
 + * With DEBUG_KMAP_LOCAL the stack depth is doubled and every second
 + * slot is unused which acts as a guard page
 + */
 +#ifdef CONFIG_DEBUG_KMAP_LOCAL
 +# define KM_INCR  2
 +#else
 +# define KM_INCR  1
 +#endif
 +
 +static inline int kmap_local_idx_push(void)
 +{
 +  WARN_ON_ONCE(in_irq() && !irqs_disabled());
 +  current->kmap_ctrl.idx += KM_INCR;
 +  BUG_ON(current->kmap_ctrl.idx >= KM_MAX_IDX);
 +  return current->kmap_ctrl.idx - 1;
 +}
 +
 +static inline int kmap_local_idx(void)
 +{
 +  return current->kmap_ctrl.idx - 1;
 +}
 +
 +static inline void kmap_local_idx_pop(void)
 +{
 +  current->kmap_ctrl.idx -= KM_INCR;
 +  BUG_ON(current->kmap_ctrl.idx < 0);
 +}
 +
 +#ifndef arch_kmap_local_post_map
 +# define arch_kmap_local_post_map(vaddr, pteval)  do { } while (0)
 +#endif
 +
 +#ifndef arch_kmap_local_pre_unmap
 +# define arch_kmap_local_pre_unmap(vaddr) do { } while (0)
 +#endif
 +
 +#ifndef arch_kmap_local_post_unmap
 +# define arch_kmap_local_post_unmap(vaddr)do { } while (0)
 +#endif
 +
 +#ifndef arch_kmap_local_map_idx
 +#define arch_kmap_local_map_idx(idx, pfn) kmap_local_calc_idx(idx)
 +#endif
 +
 +#ifndef arch_kmap_local_unmap_idx
 +#define arch_kmap_local_unmap_idx(idx, vaddr) kmap_local_calc_idx(idx)
 +#endif
 +
 +#ifndef arch_kmap_local_high_get
 +static inline void *arch_kmap_local_high_get(struct page *page)
 +{
 +  return NULL;
 +}
 +#endif
 +
 +/* Unmap a local mapping which was obtained by kmap_high_get() */
 +static inline bool kmap_high_unmap_local(unsigned long vaddr)
 +{
 +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
 +  if (vaddr >= PKMAP_ADDR(0) && vaddr < PKMAP_ADDR(LAST_PKMAP)) {
 +  kunmap_high(pte_page(pkmap_page_table[PKMAP_NR(vaddr)]));
 +  return true;
 +  }
 +#endif
 +  return false;
 +}
 +
 +static 

linux-next: manual merge of the akpm-current tree with the tip tree

2020-11-26 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/linux/kernel.h

between commit:

  74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT")

from the tip tree and commit:

  761ace49e56f ("kernel.h: Split out mathematical helpers")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/kernel.h
index dbf6018fc312,f97ab3283a8b..
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@@ -272,48 -145,13 +159,6 @@@ extern void __cant_migrate(const char *
  
  #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
  
- /**
-  * abs - return absolute value of an argument
-  * @x: the value.  If it is unsigned type, it is converted to signed type 
first.
-  * char is treated as if it was signed (regardless of whether it really 
is)
-  * but the macro's return type is preserved as char.
-  *
-  * Return: an absolute value of x.
-  */
- #define abs(x)__abs_choose_expr(x, long long, 
\
-   __abs_choose_expr(x, long,  \
-   __abs_choose_expr(x, int,   \
-   __abs_choose_expr(x, short, \
-   __abs_choose_expr(x, char,  \
-   __builtin_choose_expr(  \
-   __builtin_types_compatible_p(typeof(x), char),  \
-   (char)({ signed char __x = (x); __x<0?-__x:__x; }), \
-   ((void)0)))
- 
- #define __abs_choose_expr(x, type, other) __builtin_choose_expr(  \
-   __builtin_types_compatible_p(typeof(x),   signed type) ||   \
-   __builtin_types_compatible_p(typeof(x), unsigned type), \
-   ({ signed type __x = (x); __x < 0 ? -__x : __x; }), other)
- 
- /**
-  * reciprocal_scale - "scale" a value into range [0, ep_ro)
-  * @val: value
-  * @ep_ro: right open interval endpoint
-  *
-  * Perform a "reciprocal multiplication" in order to "scale" a value into
-  * range [0, @ep_ro), where the upper interval endpoint is right-open.
-  * This is useful, e.g. for accessing a index of an array containing
-  * @ep_ro elements, for example. Think of it as sort of modulus, only that
-  * the result isn't that of modulo. ;) Note that if initial input is a
-  * small value, then result will return 0.
-  *
-  * Return: a result based on @val in interval [0, @ep_ro).
-  */
- static inline u32 reciprocal_scale(u32 val, u32 ep_ro)
- {
-   return (u32)(((u64) val * ep_ro) >> 32);
- }
 -#ifndef CONFIG_PREEMPT_RT
 -# define cant_migrate()   cant_sleep()
 -#else
 -  /* Placeholder for now */
 -# define cant_migrate()   do { } while (0)
 -#endif
--
  #if defined(CONFIG_MMU) && \
(defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_DEBUG_ATOMIC_SLEEP))
  #define might_fault() __might_fault(__FILE__, __LINE__)


pgpr3m5up5suF.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2020-11-23 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/linux/mm.h

between commit:

  95bb7c42ac8a ("mm: Add 'mprotect' hook to struct vm_operations_struct")

from the tip tree and commit:

  6dd8e5dab7c1 ("mremap: don't allow MREMAP_DONTUNMAP on special_mappings and 
aio")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/mm.h
index e877401baae6,cd50a37aa76d..
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@@ -557,15 -557,9 +557,16 @@@ enum page_entry_size 
  struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
-   int (*split)(struct vm_area_struct * area, unsigned long addr);
-   int (*mremap)(struct vm_area_struct * area);
+   /* Called any time before splitting to check if it's allowed */
+   int (*may_split)(struct vm_area_struct *area, unsigned long addr);
+   int (*mremap)(struct vm_area_struct *area, unsigned long flags);
 +  /*
 +   * Called by mprotect() to make driver-specific permission
 +   * checks before mprotect() is finalised.   The VMA must not
 +   * be modified.  Returns 0 if eprotect() can proceed.
 +   */
 +  int (*mprotect)(struct vm_area_struct *vma, unsigned long start,
 +  unsigned long end, unsigned long newflags);
vm_fault_t (*fault)(struct vm_fault *vmf);
vm_fault_t (*huge_fault)(struct vm_fault *vmf,
enum page_entry_size pe_size);


pgpb7YR9yQcUy.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2020-11-08 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/arc/Kconfig

between commit:

  39cac191ff37 ("arc/mm/highmem: Use generic kmap atomic implementation")

from the tip tree and commit:

  b41c56d2a9e6 ("arc: use FLATMEM with freeing of unused memory map instead of 
DISCONTIGMEM")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arc/Kconfig
index 1a1ee5c4c2e7,c874f8ab0341..
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@@ -505,8 -507,7 +506,8 @@@ config LINUX_RAM_BAS
  
  config HIGHMEM
bool "High Memory Support"
-   select ARCH_DISCONTIGMEM_ENABLE
+   select HAVE_ARCH_PFN_VALID
 +  select KMAP_LOCAL
help
  With ARC 2G:2G address split, only upper 2G is directly addressable by
  kernel. Enable this to potentially allow access to rest of 2G and PAE


pgpHQ1nYBwbXl.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2020-10-13 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/linux/sched.h

between commit:

  d741bf41d7c7 ("kprobes: Remove kretprobe hash")

from the tip tree and commit:

  faf4ffbfd1c5 ("fs/buffer.c: add debug print for __getblk_gfp() stall problem")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/sched.h
index 1695d45c2d7a,a360da173c32..
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@@ -1322,10 -1320,13 +1327,17 @@@ struct task_struct 
struct callback_headmce_kill_me;
  #endif
  
 +#ifdef CONFIG_KRETPROBES
 +  struct llist_head   kretprobe_instances;
 +#endif
 +
+ #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+   unsigned long   getblk_stamp;
+   unsigned intgetblk_executed;
+   unsigned intgetblk_bh_count;
+   unsigned long   getblk_bh_state;
+ #endif
+ 
/*
 * New fields for task_struct should be added above here, so that
 * they are included in the randomized portion of task_struct.


pgpgnnay00v7M.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2020-07-17 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  lib/cpumask.c

between commit:

  1abdfe706a57 ("lib: Restrict cpumask_local_spread to houskeeping CPUs")

from the tip tree and commit:

  6f7ee3fd63c9 ("lib: optimize cpumask_local_spread()")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc lib/cpumask.c
index 85da6ab4fbb5,2fecbcd8c160..
--- a/lib/cpumask.c
+++ b/lib/cpumask.c
@@@ -6,7 -6,7 +6,8 @@@
  #include 
  #include 
  #include 
 +#include 
+ #include 
  
  /**
   * cpumask_next - get the next cpu in a cpumask
@@@ -193,40 -193,56 +194,61 @@@ void __init free_bootmem_cpumask_var(cp
  }
  #endif
  
- /**
-  * cpumask_local_spread - select the i'th cpu with local numa cpu's first
-  * @i: index number
-  * @node: local numa_node
-  *
-  * This function selects an online CPU according to a numa aware policy;
-  * local cpus are returned first, followed by non-local ones, then it
-  * wraps around.
-  *
-  * It's not very efficient, but useful for setup.
-  */
- unsigned int cpumask_local_spread(unsigned int i, int node)
+ static void calc_node_distance(int *node_dist, int node)
+ {
+   int i;
+ 
+   for (i = 0; i < nr_node_ids; i++)
+   node_dist[i] = node_distance(node, i);
+ }
+ 
+ static int find_nearest_node(int *node_dist, bool *used)
+ {
+   int i, min_dist = node_dist[0], node_id = -1;
+ 
+   /* Choose the first unused node to compare */
+   for (i = 0; i < nr_node_ids; i++) {
+   if (used[i] == 0) {
+   min_dist = node_dist[i];
+   node_id = i;
+   break;
+   }
+   }
+ 
+   /* Compare and return the nearest node */
+   for (i = 0; i < nr_node_ids; i++) {
+   if (node_dist[i] < min_dist && used[i] == 0) {
+   min_dist = node_dist[i];
+   node_id = i;
+   }
+   }
+ 
+   return node_id;
+ }
+ 
+ static unsigned int __cpumask_local_spread(unsigned int i, int node)
  {
 -  int cpu;
 +  int cpu, hk_flags;
 +  const struct cpumask *mask;
  
 +  hk_flags = HK_FLAG_DOMAIN | HK_FLAG_MANAGED_IRQ;
 +  mask = housekeeping_cpumask(hk_flags);
/* Wrap: we always want a cpu. */
 -  i %= num_online_cpus();
 +  i %= cpumask_weight(mask);
  
if (node == NUMA_NO_NODE) {
 -  for_each_cpu(cpu, cpu_online_mask)
 +  for_each_cpu(cpu, mask) {
if (i-- == 0)
return cpu;
 +  }
} else {
/* NUMA first. */
 -  for_each_cpu_and(cpu, cpumask_of_node(node), cpu_online_mask)
 +  for_each_cpu_and(cpu, cpumask_of_node(node), mask) {
if (i-- == 0)
return cpu;
 +  }
  
 -  for_each_cpu(cpu, cpu_online_mask) {
 +  for_each_cpu(cpu, mask) {
/* Skip NUMA nodes, done above. */
if (cpumask_test_cpu(cpu, cpumask_of_node(node)))
continue;


pgpPPwXMSGJr9.pgp
Description: OpenPGP digital signature


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2020-06-02 Thread Stephen Rothwell
Hi all,

On Mon, 25 May 2020 21:04:43 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   arch/x86/mm/tlb.c
> 
> between commit:
> 
>   83ce56f712af ("x86/mm: Refactor cond_ibpb() to support other use cases")
> 
> from the tip tree and commit:
> 
>   36c8e34d03a1 ("x86/mm: remove vmalloc faulting")
> 
> from the akpm-current tree.
> 
> diff --cc arch/x86/mm/tlb.c
> index c8524c506ab0,f3fe261e5936..
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@@ -345,48 -161,16 +345,20 @@@ void switch_mm(struct mm_struct *prev, 
>   local_irq_restore(flags);
>   }
>   
> - static void sync_current_stack_to_mm(struct mm_struct *mm)
> - {
> - unsigned long sp = current_stack_pointer;
> - pgd_t *pgd = pgd_offset(mm, sp);
> - 
> - if (pgtable_l5_enabled()) {
> - if (unlikely(pgd_none(*pgd))) {
> - pgd_t *pgd_ref = pgd_offset_k(sp);
> - 
> - set_pgd(pgd, *pgd_ref);
> - }
> - } else {
> - /*
> -  * "pgd" is faked.  The top level entries are "p4d"s, so sync
> -  * the p4d.  This compiles to approximately the same code as
> -  * the 5-level case.
> -  */
> - p4d_t *p4d = p4d_offset(pgd, sp);
> - 
> - if (unlikely(p4d_none(*p4d))) {
> - pgd_t *pgd_ref = pgd_offset_k(sp);
> - p4d_t *p4d_ref = p4d_offset(pgd_ref, sp);
> - 
> - set_p4d(p4d, *p4d_ref);
> - }
> - }
> - }
> - 
>  -static inline unsigned long mm_mangle_tif_spec_ib(struct task_struct *next)
>  +static inline unsigned long mm_mangle_tif_spec_bits(struct task_struct 
> *next)
>   {
>   unsigned long next_tif = task_thread_info(next)->flags;
>  -unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_IBPB;
>  +unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & 
> LAST_USER_MM_SPEC_MASK;
>   
>  -return (unsigned long)next->mm | ibpb;
>  +BUILD_BUG_ON(TIF_SPEC_L1D_FLUSH != TIF_SPEC_IB + 1);
>  +
>  +return (unsigned long)next->mm | spec_bits;
>   }
>   
>  -static void cond_ibpb(struct task_struct *next)
>  +static void cond_mitigation(struct task_struct *next)
>   {
>  +unsigned long prev_mm, next_mm;
>  +
>   if (!next || !next->mm)
>   return;
>   
> @@@ -587,20 -343,12 +559,11 @@@ void switch_mm_irqs_off(struct mm_struc
>   need_flush = true;
>   } else {
>   /*
>  - * Avoid user/user BTB poisoning by flushing the branch
>  - * predictor when switching between processes. This stops
>  - * one process from doing Spectre-v2 attacks on another.
>  + * Apply process to process speculation vulnerability
>  + * mitigations if applicable.
>*/
>  -cond_ibpb(tsk);
>  +cond_mitigation(tsk);
>   
> - if (IS_ENABLED(CONFIG_VMAP_STACK)) {
> - /*
> -  * If our current stack is in vmalloc space and isn't
> -  * mapped in the new pgd, we'll double-fault.  Forcibly
> -  * map it.
> -  */
> - sync_current_stack_to_mm(next);
> - }
> - 
>   /*
>* Stop remote flushes for the previous mm.
>* Skip kernel threads; we never send init_mm TLB flushing IPIs,

This is now a conflict between commit

  94709049fb84 ("Merge branch 'akpm' (patches from Andrew)")

from Linus' tree and the above tip tree commit.

-- 
Cheers,
Stephen Rothwell


pgpgZXYeA_qYi.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2020-05-29 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/x86/include/asm/efi.h

between commit:

  9b47c5275614 ("efi/libstub: Add definitions for console input and events")

from the tip tree and patch:

  "mm: reorder includes after introduction of linux/pgtable.h"

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 129e62146cbc..e7d2ccfdd507 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -3,13 +3,13 @@
 #define _ASM_X86_EFI_H
 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 extern unsigned long efi_fw_vendor, efi_config_table;
 


pgpipP9_MBf_q.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2020-05-29 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  mm/swap.c

between commit:

  b01b2141 ("mm/swap: Use local_lock for protection")

from the tip tree and commit:

  48c1ce8726a7 ("mm: fold and remove lru_cache_add_anon() and 
lru_cache_add_file()")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc mm/swap.c
index 0ac463d44cff,acd88873f076..
--- a/mm/swap.c
+++ b/mm/swap.c
@@@ -468,10 -435,17 +459,19 @@@ EXPORT_SYMBOL(mark_page_accessed)
   */
  void lru_cache_add(struct page *page)
  {
 -  struct pagevec *pvec = _cpu_var(lru_add_pvec);
++  struct pagevec *pvec;
+ 
VM_BUG_ON_PAGE(PageActive(page) && PageUnevictable(page), page);
VM_BUG_ON_PAGE(PageLRU(page), page);
-   __lru_cache_add(page);
+ 
++  local_lock(_pvecs.lock);
++  pvec = this_cpu_ptr(_pvecs.lru_add);
+   get_page(page);
+   if (!pagevec_add(pvec, page) || PageCompound(page))
+   __pagevec_lru_add(pvec);
 -  put_cpu_var(lru_add_pvec);
++  local_unlock(_pvecs.lock);
  }
+ EXPORT_SYMBOL(lru_cache_add);
  
  /**
   * lru_cache_add_active_or_unevictable


pgppduS41LH8Z.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2020-05-29 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/linux/sched.h

between commits:

  5567d11c21a1 ("x86/mce: Send #MC singal from task work")

from the tip tree and commit:

  e87f27165be1 ("fs/buffer.c: add debug print for __getblk_gfp() stall problem")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/sched.h
index 5216bd5ff4fb,98060427c53f..
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@@ -1303,14 -1293,13 +1307,21 @@@ struct task_struct 
unsigned long   prev_lowest_stack;
  #endif
  
 +#ifdef CONFIG_X86_MCE
 +  u64 mce_addr;
 +  __u64   mce_ripv : 1,
 +  mce_whole_page : 1,
 +  __mce_reserved : 62;
 +  struct callback_headmce_kill_me;
 +#endif
 +
+ #ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+   unsigned long   getblk_stamp;
+   unsigned intgetblk_executed;
+   unsigned intgetblk_bh_count;
+   unsigned long   getblk_bh_state;
+ #endif
+ 
/*
 * New fields for task_struct should be added above here, so that
 * they are included in the randomized portion of task_struct.


pgphl2mfA9Ihf.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2020-05-29 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  fs/squashfs/decompressor_multi_percpu.c

between commit:

  fd56200a16c7 ("squashfs: Make use of local lock in multi_cpu decompressor")

from the tip tree and commit:

  5697b27554f3 ("squashfs-migrate-from-ll_rw_block-usage-to-bio-fix")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/squashfs/decompressor_multi_percpu.c
index e206ebfe003c,d93e12d9b712..
--- a/fs/squashfs/decompressor_multi_percpu.c
+++ b/fs/squashfs/decompressor_multi_percpu.c
@@@ -75,19 -72,18 +75,18 @@@ void squashfs_decompressor_destroy(stru
}
  }
  
- int squashfs_decompress(struct squashfs_sb_info *msblk, struct buffer_head 
**bh,
-   int b, int offset, int length, struct squashfs_page_actor *output)
+ int squashfs_decompress(struct squashfs_sb_info *msblk, struct bio *bio,
+   int offset, int length, struct squashfs_page_actor *output)
  {
 -  struct squashfs_stream __percpu *percpu;
struct squashfs_stream *stream;
int res;
  
 -  percpu = (struct squashfs_stream __percpu *)msblk->stream;
 -  stream = get_cpu_ptr(percpu);
 +  local_lock(>stream->lock);
 +  stream = this_cpu_ptr(msblk->stream);
 +
-   res = msblk->decompressor->decompress(msblk, stream->stream, bh, b,
-   offset, length, output);
- 
+   res = msblk->decompressor->decompress(msblk, stream->stream, bio,
+ offset, length, output);
 -  put_cpu_ptr(stream);
 +  local_unlock(>stream->lock);
  
if (res < 0)
ERROR("%s decompression failed, data probably corrupt\n",


pgplNdOCemh2V.pgp
Description: OpenPGP digital signature


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2020-05-25 Thread Singh, Balbir
On Mon, 2020-05-25 at 21:04 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   arch/x86/mm/tlb.c
> 
> between commit:
> 
>   83ce56f712af ("x86/mm: Refactor cond_ibpb() to support other use cases")
> 
> from the tip tree and commit:
> 
>   36c8e34d03a1 ("x86/mm: remove vmalloc faulting")
> 
> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 

The changes look reasonable to me (in terms of the merge resolution).

Acked-by: Balbir Singh 



linux-next: manual merge of the akpm-current tree with the tip tree

2020-05-25 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/x86/mm/tlb.c

between commit:

  83ce56f712af ("x86/mm: Refactor cond_ibpb() to support other use cases")

from the tip tree and commit:

  36c8e34d03a1 ("x86/mm: remove vmalloc faulting")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/mm/tlb.c
index c8524c506ab0,f3fe261e5936..
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@@ -345,48 -161,16 +345,20 @@@ void switch_mm(struct mm_struct *prev, 
local_irq_restore(flags);
  }
  
- static void sync_current_stack_to_mm(struct mm_struct *mm)
- {
-   unsigned long sp = current_stack_pointer;
-   pgd_t *pgd = pgd_offset(mm, sp);
- 
-   if (pgtable_l5_enabled()) {
-   if (unlikely(pgd_none(*pgd))) {
-   pgd_t *pgd_ref = pgd_offset_k(sp);
- 
-   set_pgd(pgd, *pgd_ref);
-   }
-   } else {
-   /*
-* "pgd" is faked.  The top level entries are "p4d"s, so sync
-* the p4d.  This compiles to approximately the same code as
-* the 5-level case.
-*/
-   p4d_t *p4d = p4d_offset(pgd, sp);
- 
-   if (unlikely(p4d_none(*p4d))) {
-   pgd_t *pgd_ref = pgd_offset_k(sp);
-   p4d_t *p4d_ref = p4d_offset(pgd_ref, sp);
- 
-   set_p4d(p4d, *p4d_ref);
-   }
-   }
- }
- 
 -static inline unsigned long mm_mangle_tif_spec_ib(struct task_struct *next)
 +static inline unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
  {
unsigned long next_tif = task_thread_info(next)->flags;
 -  unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_IBPB;
 +  unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & 
LAST_USER_MM_SPEC_MASK;
  
 -  return (unsigned long)next->mm | ibpb;
 +  BUILD_BUG_ON(TIF_SPEC_L1D_FLUSH != TIF_SPEC_IB + 1);
 +
 +  return (unsigned long)next->mm | spec_bits;
  }
  
 -static void cond_ibpb(struct task_struct *next)
 +static void cond_mitigation(struct task_struct *next)
  {
 +  unsigned long prev_mm, next_mm;
 +
if (!next || !next->mm)
return;
  
@@@ -587,20 -343,12 +559,11 @@@ void switch_mm_irqs_off(struct mm_struc
need_flush = true;
} else {
/*
 -   * Avoid user/user BTB poisoning by flushing the branch
 -   * predictor when switching between processes. This stops
 -   * one process from doing Spectre-v2 attacks on another.
 +   * Apply process to process speculation vulnerability
 +   * mitigations if applicable.
 */
 -  cond_ibpb(tsk);
 +  cond_mitigation(tsk);
  
-   if (IS_ENABLED(CONFIG_VMAP_STACK)) {
-   /*
-* If our current stack is in vmalloc space and isn't
-* mapped in the new pgd, we'll double-fault.  Forcibly
-* map it.
-*/
-   sync_current_stack_to_mm(next);
-   }
- 
/*
 * Stop remote flushes for the previous mm.
 * Skip kernel threads; we never send init_mm TLB flushing IPIs,


pgpJQPFT6F0KY.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2020-05-19 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  kernel/kprobes.c

between commit:

  4fdd88877e52 ("kprobes: Lock kprobe_mutex while showing kprobe_blacklist")

from the tip tree and commit:

  71294f4f8167 ("kernel/kprobes.c: convert to use DEFINE_SEQ_ATTRIBUTE macro")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc kernel/kprobes.c
index 9622ee05f5fa,9146e1a8373b..
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@@ -2506,15 -2436,10 +2496,15 @@@ static int kprobe_blacklist_seq_show(st
return 0;
  }
  
 +static void kprobe_blacklist_seq_stop(struct seq_file *f, void *v)
 +{
 +  mutex_unlock(_mutex);
 +}
 +
- static const struct seq_operations kprobe_blacklist_seq_ops = {
+ static const struct seq_operations kprobe_blacklist_sops = {
.start = kprobe_blacklist_seq_start,
.next  = kprobe_blacklist_seq_next,
 -  .stop  = kprobe_seq_stop,   /* Reuse void function */
 +  .stop  = kprobe_blacklist_seq_stop,
.show  = kprobe_blacklist_seq_show,
  };
  


pgpkxfRhKQJAN.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2019-06-24 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  lib/debugobjects.c

between commit:

  d5f34153e526 ("debugobjects: Move printk out of db->lock critical sections")

from the tip tree and commit:

  8b6b497dfb11 ("lib/debugobjects.c: move printk out of db lock critical 
sections")

from the akpm-current tree.

I fixed it up (I reverted the akpm-current tree version) and can carry the
fix as necessary. This is now fixed as far as linux-next is concerned,
but any non trivial conflicts should be mentioned to your upstream
maintainer when your tree is submitted for merging.  You may also want
to consider cooperating with the maintainer of the conflicting tree to
minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpAWNEsk8fV8.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2019-05-01 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  mm/vmalloc.c

between commit:

  bade3b4bdcdb ("mm/vmalloc.c: refactor __vunmap() to avoid duplicated call to 
find_vm_area()")

from the tip tree and commit:

  868b104d7379 ("mm/vmalloc: Add flag for freeing of special permsissions")

from the akpm-current tree.

I fixed it up (I made an attempt ta a fix up - see below) and can carry
the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the
conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc mm/vmalloc.c
index e5e9e1fcac01,4a91acce4b5f..
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@@ -1490,94 -2103,16 +2110,83 @@@ static struct vm_struct *__remove_vm_ar
   */
  struct vm_struct *remove_vm_area(const void *addr)
  {
+   struct vm_struct *vm = NULL;
struct vmap_area *va;
  
-   might_sleep();
- 
va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA) {
-   struct vm_struct *vm = va->vm;
- 
-   spin_lock(_area_lock);
-   va->vm = NULL;
-   va->flags &= ~VM_VM_AREA;
-   va->flags |= VM_LAZY_FREE;
-   spin_unlock(_area_lock);
- 
-   kasan_free_shadow(vm);
-   free_unmap_vmap_area(va);
+   if (va && va->flags & VM_VM_AREA)
+   vm = __remove_vm_area(va);
  
-   return vm;
-   }
-   return NULL;
+   return vm;
  }
  
 +static inline void set_area_direct_map(const struct vm_struct *area,
 + int (*set_direct_map)(struct page *page))
 +{
 +  int i;
 +
 +  for (i = 0; i < area->nr_pages; i++)
 +  if (page_address(area->pages[i]))
 +  set_direct_map(area->pages[i]);
 +}
 +
 +/* Handle removing and resetting vm mappings related to the vm_struct. */
- static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
++static void vm_remove_mappings(struct vmap_area *va, int deallocate_pages)
 +{
++  struct vm_struct *area = va->vm;
 +  unsigned long addr = (unsigned long)area->addr;
 +  unsigned long start = ULONG_MAX, end = 0;
 +  int flush_reset = area->flags & VM_FLUSH_RESET_PERMS;
 +  int i;
 +
 +  /*
 +   * The below block can be removed when all architectures that have
 +   * direct map permissions also have set_direct_map_() implementations.
 +   * This is concerned with resetting the direct map any an vm alias with
 +   * execute permissions, without leaving a RW+X window.
 +   */
 +  if (flush_reset && !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
 +  set_memory_nx(addr, area->nr_pages);
 +  set_memory_rw(addr, area->nr_pages);
 +  }
 +
-   remove_vm_area(area->addr);
++  __remove_vm_area(va);
 +
 +  /* If this is not VM_FLUSH_RESET_PERMS memory, no need for the below. */
 +  if (!flush_reset)
 +  return;
 +
 +  /*
 +   * If not deallocating pages, just do the flush of the VM area and
 +   * return.
 +   */
 +  if (!deallocate_pages) {
 +  vm_unmap_aliases();
 +  return;
 +  }
 +
 +  /*
 +   * If execution gets here, flush the vm mapping and reset the direct
 +   * map. Find the start and end range of the direct mappings to make sure
 +   * the vm_unmap_aliases() flush includes the direct map.
 +   */
 +  for (i = 0; i < area->nr_pages; i++) {
 +  if (page_address(area->pages[i])) {
 +  start = min(addr, start);
 +  end = max(addr, end);
 +  }
 +  }
 +
 +  /*
 +   * Set direct map to something invalid so that it won't be cached if
 +   * there are any accesses after the TLB flush, then flush the TLB and
 +   * reset the direct map permissions to the default.
 +   */
 +  set_area_direct_map(area, set_direct_map_invalid_noflush);
 +  _vm_unmap_aliases(start, end, 1);
 +  set_area_direct_map(area, set_direct_map_default_noflush);
 +}
 +
  static void __vunmap(const void *addr, int deallocate_pages)
  {
struct vm_struct *area;
@@@ -1599,8 -2136,7 +2210,8 @@@
debug_check_no_locks_freed(area->addr, get_vm_area_size(area));
debug_check_no_obj_freed(area->addr, get_vm_area_size(area));
  
-   vm_remove_mappings(area, deallocate_pages);
 -  __remove_vm_area(va);
++  vm_remove_mappings(va, deallocate_pages);
 +
if (deallocate_pages) {
int i;
  


pgpgIsBWLOtaj.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2019-01-30 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/linux/sched.h

between commit:

  15917dc02841 ("sched: Remove stale PF_MUTEX_TESTER bit")

from the tip tree and commit:

  ca299cb98649 ("mm/cma: add PF flag to force non cma alloc")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/sched.h
index bb68abafac29,1ef3995b7564..
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@@ -1409,6 -1423,8 +1423,7 @@@ extern struct pid *cad_pid
  #define PF_UMH0x0200  /* I'm an 
Usermodehelper process */
  #define PF_NO_SETAFFINITY 0x0400  /* Userland is not allowed to 
meddle with cpus_allowed */
  #define PF_MCE_EARLY  0x0800  /* Early kill for mce process 
policy */
+ #define PF_MEMALLOC_NOCMA 0x1000  /* All allocation request will 
have _GFP_MOVABLE cleared */
 -#define PF_MUTEX_TESTER   0x2000  /* Thread belongs to 
the rt mutex tester */
  #define PF_FREEZER_SKIP   0x4000  /* Freezer should not 
count it as freezable */
  #define PF_SUSPEND_TASK   0x8000  /* This thread called 
freeze_processes() and should not be frozen */
  


pgp4DkVDtoumO.pgp
Description: OpenPGP digital signature


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2018-08-20 Thread Andrew Morton
On Mon, 20 Aug 2018 14:32:22 +1000 Stephen Rothwell  
wrote:

> Today's linux-next merge of the akpm-current tree got conflicts in:
> 
>   fs/proc/kcore.c
>   include/linux/kcore.h
> 
> between commit:
> 
>   6855dc41b246 ("x86: Add entry trampolines to kcore")
> 
> from the tip tree and commits:
> 
>   4eb27c275abf ("fs/proc/kcore.c: use __pa_symbol() for KCORE_TEXT list 
> entries")
>   ea551910d3f4 ("proc/kcore: clean up ELF header generation")
>   537412a2958f ("proc/kcore: don't grab lock for kclist_add()")
> 
> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Yup.

What's happening here?  A two month old patch turns up in linux-next in the
middle of the merge window, in the "perf/urgent" branch.  That's a strange
branch for a June 6 patch!

Is it intended that this material be merged into 4.19-rc1?


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2018-08-20 Thread Andrew Morton
On Mon, 20 Aug 2018 14:32:22 +1000 Stephen Rothwell  
wrote:

> Today's linux-next merge of the akpm-current tree got conflicts in:
> 
>   fs/proc/kcore.c
>   include/linux/kcore.h
> 
> between commit:
> 
>   6855dc41b246 ("x86: Add entry trampolines to kcore")
> 
> from the tip tree and commits:
> 
>   4eb27c275abf ("fs/proc/kcore.c: use __pa_symbol() for KCORE_TEXT list 
> entries")
>   ea551910d3f4 ("proc/kcore: clean up ELF header generation")
>   537412a2958f ("proc/kcore: don't grab lock for kclist_add()")
> 
> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Yup.

What's happening here?  A two month old patch turns up in linux-next in the
middle of the merge window, in the "perf/urgent" branch.  That's a strange
branch for a June 6 patch!

Is it intended that this material be merged into 4.19-rc1?


linux-next: manual merge of the akpm-current tree with the tip tree

2018-08-19 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got conflicts in:

  fs/proc/kcore.c
  include/linux/kcore.h

between commit:

  6855dc41b246 ("x86: Add entry trampolines to kcore")

from the tip tree and commits:

  4eb27c275abf ("fs/proc/kcore.c: use __pa_symbol() for KCORE_TEXT list 
entries")
  ea551910d3f4 ("proc/kcore: clean up ELF header generation")
  537412a2958f ("proc/kcore: don't grab lock for kclist_add()")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/proc/kcore.c
index 00282f134336,80464432dfe6..
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@@ -448,53 -291,148 +291,151 @@@ static ssize_
  read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t 
*fpos)
  {
char *buf = file->private_data;
-   ssize_t acc = 0;
-   size_t size, tsz;
-   size_t elf_buflen;
+   size_t phdrs_offset, notes_offset, data_offset;
+   size_t phdrs_len, notes_len;
+   struct kcore_list *m;
+   size_t tsz;
int nphdr;
unsigned long start;
+   size_t orig_buflen = buflen;
+   int ret = 0;
  
-   read_lock(_lock);
-   size = get_kcore_size(, _buflen);
+   down_read(_lock);
+ 
+   get_kcore_size(, _len, _len, _offset);
+   phdrs_offset = sizeof(struct elfhdr);
+   notes_offset = phdrs_offset + phdrs_len;
+ 
+   /* ELF file header. */
+   if (buflen && *fpos < sizeof(struct elfhdr)) {
+   struct elfhdr ehdr = {
+   .e_ident = {
+   [EI_MAG0] = ELFMAG0,
+   [EI_MAG1] = ELFMAG1,
+   [EI_MAG2] = ELFMAG2,
+   [EI_MAG3] = ELFMAG3,
+   [EI_CLASS] = ELF_CLASS,
+   [EI_DATA] = ELF_DATA,
+   [EI_VERSION] = EV_CURRENT,
+   [EI_OSABI] = ELF_OSABI,
+   },
+   .e_type = ET_CORE,
+   .e_machine = ELF_ARCH,
+   .e_version = EV_CURRENT,
+   .e_phoff = sizeof(struct elfhdr),
+   .e_flags = ELF_CORE_EFLAGS,
+   .e_ehsize = sizeof(struct elfhdr),
+   .e_phentsize = sizeof(struct elf_phdr),
+   .e_phnum = nphdr,
+   };
+ 
+   tsz = min_t(size_t, buflen, sizeof(struct elfhdr) - *fpos);
+   if (copy_to_user(buffer, (char *) + *fpos, tsz)) {
+   ret = -EFAULT;
+   goto out;
+   }
  
-   if (buflen == 0 || *fpos >= size) {
-   read_unlock(_lock);
-   return 0;
+   buffer += tsz;
+   buflen -= tsz;
+   *fpos += tsz;
}
  
-   /* trim buflen to not go beyond EOF */
-   if (buflen > size - *fpos)
-   buflen = size - *fpos;
- 
-   /* construct an ELF core header if we'll need some of it */
-   if (*fpos < elf_buflen) {
-   char * elf_buf;
- 
-   tsz = elf_buflen - *fpos;
-   if (buflen < tsz)
-   tsz = buflen;
-   elf_buf = kzalloc(elf_buflen, GFP_ATOMIC);
-   if (!elf_buf) {
-   read_unlock(_lock);
-   return -ENOMEM;
+   /* ELF program headers. */
+   if (buflen && *fpos < phdrs_offset + phdrs_len) {
+   struct elf_phdr *phdrs, *phdr;
+ 
+   phdrs = kzalloc(phdrs_len, GFP_KERNEL);
+   if (!phdrs) {
+   ret = -ENOMEM;
+   goto out;
}
-   elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-   read_unlock(_lock);
-   if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
-   kfree(elf_buf);
-   return -EFAULT;
+ 
+   phdrs[0].p_type = PT_NOTE;
+   phdrs[0].p_offset = notes_offset;
+   phdrs[0].p_filesz = notes_len;
+ 
+   phdr = [1];
+   list_for_each_entry(m, _head, list) {
+   phdr->p_type = PT_LOAD;
+   phdr->p_flags = PF_R | PF_W | PF_X;
+   phdr->p_offset = kc_vaddr_to_offset(m->addr) + 
data_offset;
 -  phdr->p_vaddr = (size_t)m->addr;
 -  if (m->type == KCORE_RAM)
++  if (m->type == KCORE_REMAP)
++  phdr->p_vaddr   = 

linux-next: manual merge of the akpm-current tree with the tip tree

2018-08-19 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got conflicts in:

  fs/proc/kcore.c
  include/linux/kcore.h

between commit:

  6855dc41b246 ("x86: Add entry trampolines to kcore")

from the tip tree and commits:

  4eb27c275abf ("fs/proc/kcore.c: use __pa_symbol() for KCORE_TEXT list 
entries")
  ea551910d3f4 ("proc/kcore: clean up ELF header generation")
  537412a2958f ("proc/kcore: don't grab lock for kclist_add()")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/proc/kcore.c
index 00282f134336,80464432dfe6..
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@@ -448,53 -291,148 +291,151 @@@ static ssize_
  read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t 
*fpos)
  {
char *buf = file->private_data;
-   ssize_t acc = 0;
-   size_t size, tsz;
-   size_t elf_buflen;
+   size_t phdrs_offset, notes_offset, data_offset;
+   size_t phdrs_len, notes_len;
+   struct kcore_list *m;
+   size_t tsz;
int nphdr;
unsigned long start;
+   size_t orig_buflen = buflen;
+   int ret = 0;
  
-   read_lock(_lock);
-   size = get_kcore_size(, _buflen);
+   down_read(_lock);
+ 
+   get_kcore_size(, _len, _len, _offset);
+   phdrs_offset = sizeof(struct elfhdr);
+   notes_offset = phdrs_offset + phdrs_len;
+ 
+   /* ELF file header. */
+   if (buflen && *fpos < sizeof(struct elfhdr)) {
+   struct elfhdr ehdr = {
+   .e_ident = {
+   [EI_MAG0] = ELFMAG0,
+   [EI_MAG1] = ELFMAG1,
+   [EI_MAG2] = ELFMAG2,
+   [EI_MAG3] = ELFMAG3,
+   [EI_CLASS] = ELF_CLASS,
+   [EI_DATA] = ELF_DATA,
+   [EI_VERSION] = EV_CURRENT,
+   [EI_OSABI] = ELF_OSABI,
+   },
+   .e_type = ET_CORE,
+   .e_machine = ELF_ARCH,
+   .e_version = EV_CURRENT,
+   .e_phoff = sizeof(struct elfhdr),
+   .e_flags = ELF_CORE_EFLAGS,
+   .e_ehsize = sizeof(struct elfhdr),
+   .e_phentsize = sizeof(struct elf_phdr),
+   .e_phnum = nphdr,
+   };
+ 
+   tsz = min_t(size_t, buflen, sizeof(struct elfhdr) - *fpos);
+   if (copy_to_user(buffer, (char *) + *fpos, tsz)) {
+   ret = -EFAULT;
+   goto out;
+   }
  
-   if (buflen == 0 || *fpos >= size) {
-   read_unlock(_lock);
-   return 0;
+   buffer += tsz;
+   buflen -= tsz;
+   *fpos += tsz;
}
  
-   /* trim buflen to not go beyond EOF */
-   if (buflen > size - *fpos)
-   buflen = size - *fpos;
- 
-   /* construct an ELF core header if we'll need some of it */
-   if (*fpos < elf_buflen) {
-   char * elf_buf;
- 
-   tsz = elf_buflen - *fpos;
-   if (buflen < tsz)
-   tsz = buflen;
-   elf_buf = kzalloc(elf_buflen, GFP_ATOMIC);
-   if (!elf_buf) {
-   read_unlock(_lock);
-   return -ENOMEM;
+   /* ELF program headers. */
+   if (buflen && *fpos < phdrs_offset + phdrs_len) {
+   struct elf_phdr *phdrs, *phdr;
+ 
+   phdrs = kzalloc(phdrs_len, GFP_KERNEL);
+   if (!phdrs) {
+   ret = -ENOMEM;
+   goto out;
}
-   elf_kcore_store_hdr(elf_buf, nphdr, elf_buflen);
-   read_unlock(_lock);
-   if (copy_to_user(buffer, elf_buf + *fpos, tsz)) {
-   kfree(elf_buf);
-   return -EFAULT;
+ 
+   phdrs[0].p_type = PT_NOTE;
+   phdrs[0].p_offset = notes_offset;
+   phdrs[0].p_filesz = notes_len;
+ 
+   phdr = [1];
+   list_for_each_entry(m, _head, list) {
+   phdr->p_type = PT_LOAD;
+   phdr->p_flags = PF_R | PF_W | PF_X;
+   phdr->p_offset = kc_vaddr_to_offset(m->addr) + 
data_offset;
 -  phdr->p_vaddr = (size_t)m->addr;
 -  if (m->type == KCORE_RAM)
++  if (m->type == KCORE_REMAP)
++  phdr->p_vaddr   = 

linux-next: manual merge of the akpm-current tree with the tip tree

2018-03-23 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  fs/ocfs2/filecheck.c

between commit:

  e24e960c7fe2 ("sched/wait, fs/ocfs2: Convert wait_on_atomic_t() usage to the 
new wait_var_event() API")

from the tip tree and commit:

  5a5b76d17dc4 ("ocfs2: add kobject for online file check")

from the akpm-current tree.

I fixed it up (the latter removed the code updated by the former) and
can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpJFcrMqwa5E.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2018-03-23 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  fs/ocfs2/filecheck.c

between commit:

  e24e960c7fe2 ("sched/wait, fs/ocfs2: Convert wait_on_atomic_t() usage to the 
new wait_var_event() API")

from the tip tree and commit:

  5a5b76d17dc4 ("ocfs2: add kobject for online file check")

from the akpm-current tree.

I fixed it up (the latter removed the code updated by the former) and
can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpJFcrMqwa5E.pgp
Description: OpenPGP digital signature


linux-next: manual merge of the akpm-current tree with the tip tree

2017-12-17 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  kernel/fork.c

between commit:

  5e28fd0b5fdb ("arch: Allow arch_dup_mmap() to fail")

from the tip tree and commit:

  120bd8608675 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc kernel/fork.c
index bed0eaf7233f,7fccd819866f..
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@@ -391,6 -391,241 +392,240 @@@ void free_task(struct task_struct *tsk
  }
  EXPORT_SYMBOL(free_task);
  
+ #ifdef CONFIG_MMU
+ static __latent_entropy int dup_mmap(struct mm_struct *mm,
+   struct mm_struct *oldmm)
+ {
+   struct vm_area_struct *mpnt, *tmp, *prev, **pprev;
+   struct rb_node **rb_link, *rb_parent;
+   int retval;
+   unsigned long charge;
+   LIST_HEAD(uf);
+ 
+   uprobe_start_dup_mmap();
+   if (down_write_killable(>mmap_sem)) {
+   retval = -EINTR;
+   goto fail_uprobe_end;
+   }
+   flush_cache_dup_mm(oldmm);
+   uprobe_dup_mmap(oldmm, mm);
+   /*
+* Not linked in yet - no deadlock potential:
+*/
+   down_write_nested(>mmap_sem, SINGLE_DEPTH_NESTING);
+ 
+   /* No ordering required: file already has been exposed. */
+   RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
+ 
+   mm->total_vm = oldmm->total_vm;
+   mm->data_vm = oldmm->data_vm;
+   mm->exec_vm = oldmm->exec_vm;
+   mm->stack_vm = oldmm->stack_vm;
+ 
+   rb_link = >mm_rb.rb_node;
+   rb_parent = NULL;
+   pprev = >mmap;
+   retval = ksm_fork(mm, oldmm);
+   if (retval)
+   goto out;
+   retval = khugepaged_fork(mm, oldmm);
+   if (retval)
+   goto out;
+ 
+   prev = NULL;
+   for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
+   struct file *file;
+ 
+   if (mpnt->vm_flags & VM_DONTCOPY) {
+   vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt));
+   continue;
+   }
+   charge = 0;
+   if (mpnt->vm_flags & VM_ACCOUNT) {
+   unsigned long len = vma_pages(mpnt);
+ 
+   if (security_vm_enough_memory_mm(oldmm, len)) /* sic */
+   goto fail_nomem;
+   charge = len;
+   }
+   tmp = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
+   if (!tmp)
+   goto fail_nomem;
+   *tmp = *mpnt;
+   INIT_LIST_HEAD(>anon_vma_chain);
+   retval = vma_dup_policy(mpnt, tmp);
+   if (retval)
+   goto fail_nomem_policy;
+   tmp->vm_mm = mm;
+   retval = dup_userfaultfd(tmp, );
+   if (retval)
+   goto fail_nomem_anon_vma_fork;
+   if (tmp->vm_flags & VM_WIPEONFORK) {
+   /* VM_WIPEONFORK gets a clean slate in the child. */
+   tmp->anon_vma = NULL;
+   if (anon_vma_prepare(tmp))
+   goto fail_nomem_anon_vma_fork;
+   } else if (anon_vma_fork(tmp, mpnt))
+   goto fail_nomem_anon_vma_fork;
+   tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+   tmp->vm_next = tmp->vm_prev = NULL;
+   file = tmp->vm_file;
+   if (file) {
+   struct inode *inode = file_inode(file);
+   struct address_space *mapping = file->f_mapping;
+ 
+   get_file(file);
+   if (tmp->vm_flags & VM_DENYWRITE)
+   atomic_dec(>i_writecount);
+   i_mmap_lock_write(mapping);
+   if (tmp->vm_flags & VM_SHARED)
+   atomic_inc(>i_mmap_writable);
+   flush_dcache_mmap_lock(mapping);
+   /* insert tmp into the share list, just after mpnt */
+   vma_interval_tree_insert_after(tmp, mpnt,
+   >i_mmap);
+   flush_dcache_mmap_unlock(mapping);
+   i_mmap_unlock_write(mapping);
+   }
+ 
+   /*
+* Clear hugetlb-related page reserves for children. This only
+* affects MAP_PRIVATE mappings. Faults generated by the child
+* are not guaranteed to succeed, even if read-only
+   

linux-next: manual merge of the akpm-current tree with the tip tree

2017-12-17 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  kernel/fork.c

between commit:

  5e28fd0b5fdb ("arch: Allow arch_dup_mmap() to fail")

from the tip tree and commit:

  120bd8608675 ("include/linux/sched/mm.h: uninline mmdrop_async(), etc")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc kernel/fork.c
index bed0eaf7233f,7fccd819866f..
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@@ -391,6 -391,241 +392,240 @@@ void free_task(struct task_struct *tsk
  }
  EXPORT_SYMBOL(free_task);
  
+ #ifdef CONFIG_MMU
+ static __latent_entropy int dup_mmap(struct mm_struct *mm,
+   struct mm_struct *oldmm)
+ {
+   struct vm_area_struct *mpnt, *tmp, *prev, **pprev;
+   struct rb_node **rb_link, *rb_parent;
+   int retval;
+   unsigned long charge;
+   LIST_HEAD(uf);
+ 
+   uprobe_start_dup_mmap();
+   if (down_write_killable(>mmap_sem)) {
+   retval = -EINTR;
+   goto fail_uprobe_end;
+   }
+   flush_cache_dup_mm(oldmm);
+   uprobe_dup_mmap(oldmm, mm);
+   /*
+* Not linked in yet - no deadlock potential:
+*/
+   down_write_nested(>mmap_sem, SINGLE_DEPTH_NESTING);
+ 
+   /* No ordering required: file already has been exposed. */
+   RCU_INIT_POINTER(mm->exe_file, get_mm_exe_file(oldmm));
+ 
+   mm->total_vm = oldmm->total_vm;
+   mm->data_vm = oldmm->data_vm;
+   mm->exec_vm = oldmm->exec_vm;
+   mm->stack_vm = oldmm->stack_vm;
+ 
+   rb_link = >mm_rb.rb_node;
+   rb_parent = NULL;
+   pprev = >mmap;
+   retval = ksm_fork(mm, oldmm);
+   if (retval)
+   goto out;
+   retval = khugepaged_fork(mm, oldmm);
+   if (retval)
+   goto out;
+ 
+   prev = NULL;
+   for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
+   struct file *file;
+ 
+   if (mpnt->vm_flags & VM_DONTCOPY) {
+   vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt));
+   continue;
+   }
+   charge = 0;
+   if (mpnt->vm_flags & VM_ACCOUNT) {
+   unsigned long len = vma_pages(mpnt);
+ 
+   if (security_vm_enough_memory_mm(oldmm, len)) /* sic */
+   goto fail_nomem;
+   charge = len;
+   }
+   tmp = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL);
+   if (!tmp)
+   goto fail_nomem;
+   *tmp = *mpnt;
+   INIT_LIST_HEAD(>anon_vma_chain);
+   retval = vma_dup_policy(mpnt, tmp);
+   if (retval)
+   goto fail_nomem_policy;
+   tmp->vm_mm = mm;
+   retval = dup_userfaultfd(tmp, );
+   if (retval)
+   goto fail_nomem_anon_vma_fork;
+   if (tmp->vm_flags & VM_WIPEONFORK) {
+   /* VM_WIPEONFORK gets a clean slate in the child. */
+   tmp->anon_vma = NULL;
+   if (anon_vma_prepare(tmp))
+   goto fail_nomem_anon_vma_fork;
+   } else if (anon_vma_fork(tmp, mpnt))
+   goto fail_nomem_anon_vma_fork;
+   tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+   tmp->vm_next = tmp->vm_prev = NULL;
+   file = tmp->vm_file;
+   if (file) {
+   struct inode *inode = file_inode(file);
+   struct address_space *mapping = file->f_mapping;
+ 
+   get_file(file);
+   if (tmp->vm_flags & VM_DENYWRITE)
+   atomic_dec(>i_writecount);
+   i_mmap_lock_write(mapping);
+   if (tmp->vm_flags & VM_SHARED)
+   atomic_inc(>i_mmap_writable);
+   flush_dcache_mmap_lock(mapping);
+   /* insert tmp into the share list, just after mpnt */
+   vma_interval_tree_insert_after(tmp, mpnt,
+   >i_mmap);
+   flush_dcache_mmap_unlock(mapping);
+   i_mmap_unlock_write(mapping);
+   }
+ 
+   /*
+* Clear hugetlb-related page reserves for children. This only
+* affects MAP_PRIVATE mappings. Faults generated by the child
+* are not guaranteed to succeed, even if read-only
+   

linux-next: manual merge of the akpm-current tree with the tip tree

2017-11-09 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  kernel/softirq.c

between commit:

  f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs are disabled/enabled")

from the tip tree and commit:

  275f9389fa4e ("kmemcheck: rip it out")

from the akpm-current tree.

I fixed it up (the latter removed code modified by the former) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the akpm-current tree with the tip tree

2017-11-09 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  kernel/softirq.c

between commit:

  f71b74bca637 ("irq/softirqs: Use lockdep to assert IRQs are disabled/enabled")

from the tip tree and commit:

  275f9389fa4e ("kmemcheck: rip it out")

from the akpm-current tree.

I fixed it up (the latter removed code modified by the former) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the akpm-current tree with the tip tree

2017-11-02 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/x86/mm/kasan_init_64.c

between commit:

  12a8cc7fcf54 ("x86/kasan: Use the same shadow offset for 4- and 5-level 
paging")

from the tip tree and commit:

  3af83426c380 ("x86/kasan: add and use kasan_map_populate()")

from the akpm-current tree.

I fixed it up (hopefully - see below) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/mm/kasan_init_64.c
index fe5760db7b19,9778fec8a5dc..
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@@ -15,8 -15,73 +15,75 @@@
  
  extern struct range pfn_mapped[E820_MAX_ENTRIES];
  
 +static p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE);
 +
+ /* Creates mappings for kasan during early boot. The mapped memory is zeroed 
*/
+ static int __meminit kasan_map_populate(unsigned long start, unsigned long 
end,
+   int node)
+ {
+   unsigned long addr, pfn, next;
+   unsigned long long size;
+   pgd_t *pgd;
+   p4d_t *p4d;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   int ret;
+ 
+   ret = vmemmap_populate(start, end, node);
+   /*
+* We might have partially populated memory, so check for no entries,
+* and zero only those that actually exist.
+*/
+   for (addr = start; addr < end; addr = next) {
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd)) {
+   next = pgd_addr_end(addr, end);
+   continue;
+   }
+ 
+   p4d = p4d_offset(pgd, addr);
+   if (p4d_none(*p4d)) {
+   next = p4d_addr_end(addr, end);
+   continue;
+   }
+ 
+   pud = pud_offset(p4d, addr);
+   if (pud_none(*pud)) {
+   next = pud_addr_end(addr, end);
+   continue;
+   }
+   if (pud_large(*pud)) {
+   /* This is PUD size page */
+   next = pud_addr_end(addr, end);
+   size = PUD_SIZE;
+   pfn = pud_pfn(*pud);
+   } else {
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd)) {
+   next = pmd_addr_end(addr, end);
+   continue;
+   }
+   if (pmd_large(*pmd)) {
+   /* This is PMD size page */
+   next = pmd_addr_end(addr, end);
+   size = PMD_SIZE;
+   pfn = pmd_pfn(*pmd);
+   } else {
+   pte = pte_offset_kernel(pmd, addr);
+   next = addr + PAGE_SIZE;
+   if (pte_none(*pte))
+   continue;
+   /* This is base size page */
+   size = PAGE_SIZE;
+   pfn = pte_pfn(*pte);
+   }
+   }
+   memset(phys_to_virt(PFN_PHYS(pfn)), 0, size);
+   }
+   return ret;
+ }
+ 
  static int __init map_range(struct range *range)
  {
unsigned long start;


linux-next: manual merge of the akpm-current tree with the tip tree

2017-11-02 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/x86/mm/kasan_init_64.c

between commit:

  12a8cc7fcf54 ("x86/kasan: Use the same shadow offset for 4- and 5-level 
paging")

from the tip tree and commit:

  3af83426c380 ("x86/kasan: add and use kasan_map_populate()")

from the akpm-current tree.

I fixed it up (hopefully - see below) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/mm/kasan_init_64.c
index fe5760db7b19,9778fec8a5dc..
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@@ -15,8 -15,73 +15,75 @@@
  
  extern struct range pfn_mapped[E820_MAX_ENTRIES];
  
 +static p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE);
 +
+ /* Creates mappings for kasan during early boot. The mapped memory is zeroed 
*/
+ static int __meminit kasan_map_populate(unsigned long start, unsigned long 
end,
+   int node)
+ {
+   unsigned long addr, pfn, next;
+   unsigned long long size;
+   pgd_t *pgd;
+   p4d_t *p4d;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   int ret;
+ 
+   ret = vmemmap_populate(start, end, node);
+   /*
+* We might have partially populated memory, so check for no entries,
+* and zero only those that actually exist.
+*/
+   for (addr = start; addr < end; addr = next) {
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd)) {
+   next = pgd_addr_end(addr, end);
+   continue;
+   }
+ 
+   p4d = p4d_offset(pgd, addr);
+   if (p4d_none(*p4d)) {
+   next = p4d_addr_end(addr, end);
+   continue;
+   }
+ 
+   pud = pud_offset(p4d, addr);
+   if (pud_none(*pud)) {
+   next = pud_addr_end(addr, end);
+   continue;
+   }
+   if (pud_large(*pud)) {
+   /* This is PUD size page */
+   next = pud_addr_end(addr, end);
+   size = PUD_SIZE;
+   pfn = pud_pfn(*pud);
+   } else {
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd)) {
+   next = pmd_addr_end(addr, end);
+   continue;
+   }
+   if (pmd_large(*pmd)) {
+   /* This is PMD size page */
+   next = pmd_addr_end(addr, end);
+   size = PMD_SIZE;
+   pfn = pmd_pfn(*pmd);
+   } else {
+   pte = pte_offset_kernel(pmd, addr);
+   next = addr + PAGE_SIZE;
+   if (pte_none(*pte))
+   continue;
+   /* This is base size page */
+   size = PAGE_SIZE;
+   pfn = pte_pfn(*pte);
+   }
+   }
+   memset(phys_to_virt(PFN_PHYS(pfn)), 0, size);
+   }
+   return ret;
+ }
+ 
  static int __init map_range(struct range *range)
  {
unsigned long start;


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-23 Thread Vlastimil Babka
On 08/22/2017 08:57 AM, Stephen Rothwell wrote:
> Hi Andrew,

Hi,

> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   init/main.c
> 
> between commit:
> 
>   caba4cbbd27d ("debugobjects: Make kmemleak ignore debug objects")
> 
> from the tip tree and commit:
> 
>   50a7dc046b58 ("mm, page_ext: move page_ext_init() after 
> page_alloc_init_late()")

This patch can be also dropped from mmotm. It was a RFC and review
suggested a different approach which I didn't get to try yet. (The other
patches in the series should be fine to stay in any case).

> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 



Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-23 Thread Vlastimil Babka
On 08/22/2017 08:57 AM, Stephen Rothwell wrote:
> Hi Andrew,

Hi,

> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   init/main.c
> 
> between commit:
> 
>   caba4cbbd27d ("debugobjects: Make kmemleak ignore debug objects")
> 
> from the tip tree and commit:
> 
>   50a7dc046b58 ("mm, page_ext: move page_ext_init() after 
> page_alloc_init_late()")

This patch can be also dropped from mmotm. It was a RFC and review
suggested a different approach which I didn't get to try yet. (The other
patches in the series should be fine to stay in any case).

> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 



linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-22 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  init/main.c

between commit:

  caba4cbbd27d ("debugobjects: Make kmemleak ignore debug objects")

from the tip tree and commit:

  50a7dc046b58 ("mm, page_ext: move page_ext_init() after 
page_alloc_init_late()")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc init/main.c
index aea41cf8f9a3,c401e5a38af3..
--- a/init/main.c
+++ b/init/main.c
@@@ -658,9 -651,8 +659,8 @@@ asmlinkage __visible void __init start_
initrd_start = 0;
}
  #endif
-   page_ext_init();
 -  debug_objects_mem_init();
kmemleak_init();
 +  debug_objects_mem_init();
setup_per_cpu_pageset();
numa_policy_init();
if (late_time_init)


linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-22 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  init/main.c

between commit:

  caba4cbbd27d ("debugobjects: Make kmemleak ignore debug objects")

from the tip tree and commit:

  50a7dc046b58 ("mm, page_ext: move page_ext_init() after 
page_alloc_init_late()")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc init/main.c
index aea41cf8f9a3,c401e5a38af3..
--- a/init/main.c
+++ b/init/main.c
@@@ -658,9 -651,8 +659,8 @@@ asmlinkage __visible void __init start_
initrd_start = 0;
}
  #endif
-   page_ext_init();
 -  debug_objects_mem_init();
kmemleak_init();
 +  debug_objects_mem_init();
setup_per_cpu_pageset();
numa_policy_init();
if (late_time_init)


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-15 Thread Minchan Kim
On Mon, Aug 14, 2017 at 09:57:23PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 14, 2017 at 05:38:39PM +0900, Minchan Kim wrote:
> > memory-barrier.txt always scares me. I have read it for a while
> > and IIUC, it seems semantic of spin_unlock(_pte) would be
> > enough without some memory-barrier inside mm_tlb_flush_nested.
> 
> Indeed, see the email I just send. Its both spin_lock() and
> spin_unlock() that we care about.
> 
> Aside from the semi permeable barrier of these primitives, RCpc ensures
> these orderings only work against the _same_ lock variable.
> 
> Let me try and explain the ordering for PPC (which is by far the worst
> we have in this regard):
> 
> 
> spin_lock(lock)
> {
>   while (test_and_set(lock))
>   cpu_relax();
>   lwsync();
> }
> 
> 
> spin_unlock(lock)
> {
>   lwsync();
>   clear(lock);
> }
> 
> Now LWSYNC has fairly 'simple' semantics, but with fairly horrible
> ramifications. Consider LWSYNC to provide _local_ TSO ordering, this
> means that it allows 'stores reordered after loads'.
> 
> For the spin_lock() that implies that all load/store's inside the lock
> do indeed stay in, but the ACQUIRE is only on the LOAD of the
> test_and_set(). That is, the actual _set_ can leak in. After all it can
> re-order stores after load (inside the lock).
> 
> For unlock it again means all load/store's prior stay prior, and the
> RELEASE is on the store clearing the lock state (nothing surprising
> here).
> 
> Now the _local_ part, the main take-away is that these orderings are
> strictly CPU local. What makes the spinlock work across CPUs (as we'd
> very much expect it to) is the address dependency on the lock variable.
> 
> In order for the spin_lock() to succeed, it must observe the clear. Its
> this link that crosses between the CPUs and builds the ordering. But
> only the two CPUs agree on this order. A third CPU not involved in
> this transaction can disagree on the order of events.

The detail explanation in your previous reply makes me comfortable
from scary memory-barrier.txt but this reply makes me scared again. ;-)

Thanks for the kind clarification, Peter!



Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-15 Thread Minchan Kim
On Mon, Aug 14, 2017 at 09:57:23PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 14, 2017 at 05:38:39PM +0900, Minchan Kim wrote:
> > memory-barrier.txt always scares me. I have read it for a while
> > and IIUC, it seems semantic of spin_unlock(_pte) would be
> > enough without some memory-barrier inside mm_tlb_flush_nested.
> 
> Indeed, see the email I just send. Its both spin_lock() and
> spin_unlock() that we care about.
> 
> Aside from the semi permeable barrier of these primitives, RCpc ensures
> these orderings only work against the _same_ lock variable.
> 
> Let me try and explain the ordering for PPC (which is by far the worst
> we have in this regard):
> 
> 
> spin_lock(lock)
> {
>   while (test_and_set(lock))
>   cpu_relax();
>   lwsync();
> }
> 
> 
> spin_unlock(lock)
> {
>   lwsync();
>   clear(lock);
> }
> 
> Now LWSYNC has fairly 'simple' semantics, but with fairly horrible
> ramifications. Consider LWSYNC to provide _local_ TSO ordering, this
> means that it allows 'stores reordered after loads'.
> 
> For the spin_lock() that implies that all load/store's inside the lock
> do indeed stay in, but the ACQUIRE is only on the LOAD of the
> test_and_set(). That is, the actual _set_ can leak in. After all it can
> re-order stores after load (inside the lock).
> 
> For unlock it again means all load/store's prior stay prior, and the
> RELEASE is on the store clearing the lock state (nothing surprising
> here).
> 
> Now the _local_ part, the main take-away is that these orderings are
> strictly CPU local. What makes the spinlock work across CPUs (as we'd
> very much expect it to) is the address dependency on the lock variable.
> 
> In order for the spin_lock() to succeed, it must observe the clear. Its
> this link that crosses between the CPUs and builds the ordering. But
> only the two CPUs agree on this order. A third CPU not involved in
> this transaction can disagree on the order of events.

The detail explanation in your previous reply makes me comfortable
from scary memory-barrier.txt but this reply makes me scared again. ;-)

Thanks for the kind clarification, Peter!



Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-15 Thread Nadav Amit
Peter Zijlstra  wrote:

> On Mon, Aug 14, 2017 at 05:07:19AM +, Nadav Amit wrote:
 So I'm not entirely clear about this yet.
 
 How about:
 
 
CPU0CPU1
 
tlb_gather_mmu()
 
lock PTLn
no mod
unlock PTLn
 
tlb_gather_mmu()
 
lock PTLm
mod
include in tlb range
unlock PTLm
 
lock PTLn
mod
unlock PTLn
 
tlb_finish_mmu()
  force = mm_tlb_flush_nested(tlb->mm);
  arch_tlb_finish_mmu(force);
 
 
... more ...
 
tlb_finish_mmu()
 
 
 
 In this case you also want CPU1's mm_tlb_flush_nested() call to return
 true, right?
>>> 
>>> No, because CPU 1 mofified pte and added it into tlb range
>>> so regardless of nested, it will flush TLB so there is no stale
>>> TLB problem.
> 
>> To clarify: the main problem that these patches address is when the first
>> CPU updates the PTE, and second CPU sees the updated value and thinks: “the
>> PTE is already what I wanted - no flush is needed”.
> 
> OK, that simplifies things.
> 
>> For some reason (I would assume intentional), all the examples here first
>> “do not modify” the PTE, and then modify it - which is not an “interesting”
>> case.
> 
> Depends on what you call 'interesting' :-) They are 'interesting' to
> make work from a memory ordering POV. And since I didn't get they were
> excluded from the set, I worried.
> 
> In fact, if they were to be included, I couldn't make it work at all. So
> I'm really glad to hear we can disregard them.
> 
>> However, based on what I understand on the memory barriers, I think
>> there is indeed a missing barrier before reading it in
>> mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case,
>> before reading, would solve the problem with least impact on systems with
>> strong memory ordering.
> 
> No, all is well. If, as you say, we're naturally constrained to the case
> where we only care about prior modification we can rely on the RCpc PTL
> locks.
> 
> Consider:
> 
> 
>   CPU0CPU1
> 
>   tlb_gather_mmu()
> 
>   tlb_gather_mmu()
> inc   .
>   | (inc is constrained by RELEASE)
>   lock PTLn   |
>   mod ^
>   unlock PTLn ->  lock PTLn
>   v   no mod
>   |   unlock PTLn
>   |
>   |   lock PTLm
>   |   mod
>   |   include in tlb range
>   |   unlock PTLm
>   |
>   (read is constrained|
> by ACQUIRE)   |
>   |   tlb_finish_mmu()
>   ` force = mm_tlb_flush_nested(tlb->mm);
> arch_tlb_finish_mmu(force);
> 
> 
>   ... more ...
> 
>   tlb_finish_mmu()
> 
> 
> Then CPU1's acquire of PTLn orders against CPU0's release of that same
> PTLn which guarantees we observe both its (prior) modified PTE and the
> mm->tlb_flush_pending increment from tlb_gather_mmu().
> 
> So all we need for mm_tlb_flush_nested() to work is having acquired the
> right PTL at least once before calling it.
> 
> At the same time, the decrements need to be after the TLB invalidate is
> complete, this ensures that _IF_ we observe the decrement, we must've
> also observed the corresponding invalidate.
> 
> Something like the below is then sufficient.
> 
> ---
> Subject: mm: Clarify tlb_flush_pending barriers
> From: Peter Zijlstra 
> Date: Fri, 11 Aug 2017 16:04:50 +0200
> 
> Better document the ordering around tlb_flush_pending.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
> include/linux/mm_types.h |   78 
> +++
> 1 file changed, 45 insertions(+), 33 deletions(-)
> 
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -526,30 +526,6 @@ extern void tlb_gather_mmu(struct mmu_ga
> extern void tlb_finish_mmu(struct mmu_gather *tlb,
>   unsigned long start, unsigned long end);
> 
> -/*
> - * Memory barriers to keep this state in sync are graciously provided by
> - * the page table locks, outside of which no page table modifications happen.
> - * The barriers are used to ensure the 

Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-15 Thread Nadav Amit
Peter Zijlstra  wrote:

> On Mon, Aug 14, 2017 at 05:07:19AM +, Nadav Amit wrote:
 So I'm not entirely clear about this yet.
 
 How about:
 
 
CPU0CPU1
 
tlb_gather_mmu()
 
lock PTLn
no mod
unlock PTLn
 
tlb_gather_mmu()
 
lock PTLm
mod
include in tlb range
unlock PTLm
 
lock PTLn
mod
unlock PTLn
 
tlb_finish_mmu()
  force = mm_tlb_flush_nested(tlb->mm);
  arch_tlb_finish_mmu(force);
 
 
... more ...
 
tlb_finish_mmu()
 
 
 
 In this case you also want CPU1's mm_tlb_flush_nested() call to return
 true, right?
>>> 
>>> No, because CPU 1 mofified pte and added it into tlb range
>>> so regardless of nested, it will flush TLB so there is no stale
>>> TLB problem.
> 
>> To clarify: the main problem that these patches address is when the first
>> CPU updates the PTE, and second CPU sees the updated value and thinks: “the
>> PTE is already what I wanted - no flush is needed”.
> 
> OK, that simplifies things.
> 
>> For some reason (I would assume intentional), all the examples here first
>> “do not modify” the PTE, and then modify it - which is not an “interesting”
>> case.
> 
> Depends on what you call 'interesting' :-) They are 'interesting' to
> make work from a memory ordering POV. And since I didn't get they were
> excluded from the set, I worried.
> 
> In fact, if they were to be included, I couldn't make it work at all. So
> I'm really glad to hear we can disregard them.
> 
>> However, based on what I understand on the memory barriers, I think
>> there is indeed a missing barrier before reading it in
>> mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case,
>> before reading, would solve the problem with least impact on systems with
>> strong memory ordering.
> 
> No, all is well. If, as you say, we're naturally constrained to the case
> where we only care about prior modification we can rely on the RCpc PTL
> locks.
> 
> Consider:
> 
> 
>   CPU0CPU1
> 
>   tlb_gather_mmu()
> 
>   tlb_gather_mmu()
> inc   .
>   | (inc is constrained by RELEASE)
>   lock PTLn   |
>   mod ^
>   unlock PTLn ->  lock PTLn
>   v   no mod
>   |   unlock PTLn
>   |
>   |   lock PTLm
>   |   mod
>   |   include in tlb range
>   |   unlock PTLm
>   |
>   (read is constrained|
> by ACQUIRE)   |
>   |   tlb_finish_mmu()
>   ` force = mm_tlb_flush_nested(tlb->mm);
> arch_tlb_finish_mmu(force);
> 
> 
>   ... more ...
> 
>   tlb_finish_mmu()
> 
> 
> Then CPU1's acquire of PTLn orders against CPU0's release of that same
> PTLn which guarantees we observe both its (prior) modified PTE and the
> mm->tlb_flush_pending increment from tlb_gather_mmu().
> 
> So all we need for mm_tlb_flush_nested() to work is having acquired the
> right PTL at least once before calling it.
> 
> At the same time, the decrements need to be after the TLB invalidate is
> complete, this ensures that _IF_ we observe the decrement, we must've
> also observed the corresponding invalidate.
> 
> Something like the below is then sufficient.
> 
> ---
> Subject: mm: Clarify tlb_flush_pending barriers
> From: Peter Zijlstra 
> Date: Fri, 11 Aug 2017 16:04:50 +0200
> 
> Better document the ordering around tlb_flush_pending.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
> include/linux/mm_types.h |   78 
> +++
> 1 file changed, 45 insertions(+), 33 deletions(-)
> 
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -526,30 +526,6 @@ extern void tlb_gather_mmu(struct mmu_ga
> extern void tlb_finish_mmu(struct mmu_gather *tlb,
>   unsigned long start, unsigned long end);
> 
> -/*
> - * Memory barriers to keep this state in sync are graciously provided by
> - * the page table locks, outside of which no page table modifications happen.
> - * The barriers are used to ensure the order between tlb_flush_pending 
> updates,
> - * which happen while 

Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-14 Thread Peter Zijlstra
On Mon, Aug 14, 2017 at 05:38:39PM +0900, Minchan Kim wrote:
> memory-barrier.txt always scares me. I have read it for a while
> and IIUC, it seems semantic of spin_unlock(_pte) would be
> enough without some memory-barrier inside mm_tlb_flush_nested.

Indeed, see the email I just send. Its both spin_lock() and
spin_unlock() that we care about.

Aside from the semi permeable barrier of these primitives, RCpc ensures
these orderings only work against the _same_ lock variable.

Let me try and explain the ordering for PPC (which is by far the worst
we have in this regard):


spin_lock(lock)
{
while (test_and_set(lock))
cpu_relax();
lwsync();
}


spin_unlock(lock)
{
lwsync();
clear(lock);
}

Now LWSYNC has fairly 'simple' semantics, but with fairly horrible
ramifications. Consider LWSYNC to provide _local_ TSO ordering, this
means that it allows 'stores reordered after loads'.

For the spin_lock() that implies that all load/store's inside the lock
do indeed stay in, but the ACQUIRE is only on the LOAD of the
test_and_set(). That is, the actual _set_ can leak in. After all it can
re-order stores after load (inside the lock).

For unlock it again means all load/store's prior stay prior, and the
RELEASE is on the store clearing the lock state (nothing surprising
here).

Now the _local_ part, the main take-away is that these orderings are
strictly CPU local. What makes the spinlock work across CPUs (as we'd
very much expect it to) is the address dependency on the lock variable.

In order for the spin_lock() to succeed, it must observe the clear. Its
this link that crosses between the CPUs and builds the ordering. But
only the two CPUs agree on this order. A third CPU not involved in
this transaction can disagree on the order of events.


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-14 Thread Peter Zijlstra
On Mon, Aug 14, 2017 at 05:38:39PM +0900, Minchan Kim wrote:
> memory-barrier.txt always scares me. I have read it for a while
> and IIUC, it seems semantic of spin_unlock(_pte) would be
> enough without some memory-barrier inside mm_tlb_flush_nested.

Indeed, see the email I just send. Its both spin_lock() and
spin_unlock() that we care about.

Aside from the semi permeable barrier of these primitives, RCpc ensures
these orderings only work against the _same_ lock variable.

Let me try and explain the ordering for PPC (which is by far the worst
we have in this regard):


spin_lock(lock)
{
while (test_and_set(lock))
cpu_relax();
lwsync();
}


spin_unlock(lock)
{
lwsync();
clear(lock);
}

Now LWSYNC has fairly 'simple' semantics, but with fairly horrible
ramifications. Consider LWSYNC to provide _local_ TSO ordering, this
means that it allows 'stores reordered after loads'.

For the spin_lock() that implies that all load/store's inside the lock
do indeed stay in, but the ACQUIRE is only on the LOAD of the
test_and_set(). That is, the actual _set_ can leak in. After all it can
re-order stores after load (inside the lock).

For unlock it again means all load/store's prior stay prior, and the
RELEASE is on the store clearing the lock state (nothing surprising
here).

Now the _local_ part, the main take-away is that these orderings are
strictly CPU local. What makes the spinlock work across CPUs (as we'd
very much expect it to) is the address dependency on the lock variable.

In order for the spin_lock() to succeed, it must observe the clear. Its
this link that crosses between the CPUs and builds the ordering. But
only the two CPUs agree on this order. A third CPU not involved in
this transaction can disagree on the order of events.


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-14 Thread Peter Zijlstra
On Mon, Aug 14, 2017 at 05:07:19AM +, Nadav Amit wrote:
> >> So I'm not entirely clear about this yet.
> >> 
> >> How about:
> >> 
> >> 
> >>CPU0CPU1
> >> 
> >>tlb_gather_mmu()
> >> 
> >>lock PTLn
> >>no mod
> >>unlock PTLn
> >> 
> >>tlb_gather_mmu()
> >> 
> >>lock PTLm
> >>mod
> >>include in tlb range
> >>unlock PTLm
> >> 
> >>lock PTLn
> >>mod
> >>unlock PTLn
> >> 
> >>tlb_finish_mmu()
> >>  force = mm_tlb_flush_nested(tlb->mm);
> >>  arch_tlb_finish_mmu(force);
> >> 
> >> 
> >>... more ...
> >> 
> >>tlb_finish_mmu()
> >> 
> >> 
> >> 
> >> In this case you also want CPU1's mm_tlb_flush_nested() call to return
> >> true, right?
> > 
> > No, because CPU 1 mofified pte and added it into tlb range
> > so regardless of nested, it will flush TLB so there is no stale
> > TLB problem.

> To clarify: the main problem that these patches address is when the first
> CPU updates the PTE, and second CPU sees the updated value and thinks: “the
> PTE is already what I wanted - no flush is needed”.

OK, that simplifies things.

> For some reason (I would assume intentional), all the examples here first
> “do not modify” the PTE, and then modify it - which is not an “interesting”
> case.

Depends on what you call 'interesting' :-) They are 'interesting' to
make work from a memory ordering POV. And since I didn't get they were
excluded from the set, I worried.

In fact, if they were to be included, I couldn't make it work at all. So
I'm really glad to hear we can disregard them.

> However, based on what I understand on the memory barriers, I think
> there is indeed a missing barrier before reading it in
> mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case,
> before reading, would solve the problem with least impact on systems with
> strong memory ordering.

No, all is well. If, as you say, we're naturally constrained to the case
where we only care about prior modification we can rely on the RCpc PTL
locks.

Consider:


CPU0CPU1

tlb_gather_mmu()

tlb_gather_mmu()
  inc   .
| (inc is constrained by RELEASE)
lock PTLn   |
mod ^
unlock PTLn ->  lock PTLn
v   no mod
|   unlock PTLn
|
|   lock PTLm
|   mod
|   include in tlb range
|   unlock PTLm
|
(read is constrained|
  by ACQUIRE)   |
|   tlb_finish_mmu()
` force = mm_tlb_flush_nested(tlb->mm);
  arch_tlb_finish_mmu(force);


... more ...

tlb_finish_mmu()


Then CPU1's acquire of PTLn orders against CPU0's release of that same
PTLn which guarantees we observe both its (prior) modified PTE and the
mm->tlb_flush_pending increment from tlb_gather_mmu().

So all we need for mm_tlb_flush_nested() to work is having acquired the
right PTL at least once before calling it.

At the same time, the decrements need to be after the TLB invalidate is
complete, this ensures that _IF_ we observe the decrement, we must've
also observed the corresponding invalidate.

Something like the below is then sufficient.

---
Subject: mm: Clarify tlb_flush_pending barriers
From: Peter Zijlstra 
Date: Fri, 11 Aug 2017 16:04:50 +0200

Better document the ordering around tlb_flush_pending.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/mm_types.h |   78 +++
 1 file changed, 45 insertions(+), 33 deletions(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -526,30 +526,6 @@ extern void tlb_gather_mmu(struct mmu_ga
 extern void tlb_finish_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end);
 
-/*
- * Memory barriers to keep this state in sync are graciously provided by
- * the page table locks, outside of which no page table modifications happen.
- * The barriers are used to ensure the order between tlb_flush_pending updates,
- * which happen while the lock is not taken, and the PTE updates, which happen
- * while the lock is taken, are serialized.
- */
-static 

Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-14 Thread Peter Zijlstra
On Mon, Aug 14, 2017 at 05:07:19AM +, Nadav Amit wrote:
> >> So I'm not entirely clear about this yet.
> >> 
> >> How about:
> >> 
> >> 
> >>CPU0CPU1
> >> 
> >>tlb_gather_mmu()
> >> 
> >>lock PTLn
> >>no mod
> >>unlock PTLn
> >> 
> >>tlb_gather_mmu()
> >> 
> >>lock PTLm
> >>mod
> >>include in tlb range
> >>unlock PTLm
> >> 
> >>lock PTLn
> >>mod
> >>unlock PTLn
> >> 
> >>tlb_finish_mmu()
> >>  force = mm_tlb_flush_nested(tlb->mm);
> >>  arch_tlb_finish_mmu(force);
> >> 
> >> 
> >>... more ...
> >> 
> >>tlb_finish_mmu()
> >> 
> >> 
> >> 
> >> In this case you also want CPU1's mm_tlb_flush_nested() call to return
> >> true, right?
> > 
> > No, because CPU 1 mofified pte and added it into tlb range
> > so regardless of nested, it will flush TLB so there is no stale
> > TLB problem.

> To clarify: the main problem that these patches address is when the first
> CPU updates the PTE, and second CPU sees the updated value and thinks: “the
> PTE is already what I wanted - no flush is needed”.

OK, that simplifies things.

> For some reason (I would assume intentional), all the examples here first
> “do not modify” the PTE, and then modify it - which is not an “interesting”
> case.

Depends on what you call 'interesting' :-) They are 'interesting' to
make work from a memory ordering POV. And since I didn't get they were
excluded from the set, I worried.

In fact, if they were to be included, I couldn't make it work at all. So
I'm really glad to hear we can disregard them.

> However, based on what I understand on the memory barriers, I think
> there is indeed a missing barrier before reading it in
> mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case,
> before reading, would solve the problem with least impact on systems with
> strong memory ordering.

No, all is well. If, as you say, we're naturally constrained to the case
where we only care about prior modification we can rely on the RCpc PTL
locks.

Consider:


CPU0CPU1

tlb_gather_mmu()

tlb_gather_mmu()
  inc   .
| (inc is constrained by RELEASE)
lock PTLn   |
mod ^
unlock PTLn ->  lock PTLn
v   no mod
|   unlock PTLn
|
|   lock PTLm
|   mod
|   include in tlb range
|   unlock PTLm
|
(read is constrained|
  by ACQUIRE)   |
|   tlb_finish_mmu()
` force = mm_tlb_flush_nested(tlb->mm);
  arch_tlb_finish_mmu(force);


... more ...

tlb_finish_mmu()


Then CPU1's acquire of PTLn orders against CPU0's release of that same
PTLn which guarantees we observe both its (prior) modified PTE and the
mm->tlb_flush_pending increment from tlb_gather_mmu().

So all we need for mm_tlb_flush_nested() to work is having acquired the
right PTL at least once before calling it.

At the same time, the decrements need to be after the TLB invalidate is
complete, this ensures that _IF_ we observe the decrement, we must've
also observed the corresponding invalidate.

Something like the below is then sufficient.

---
Subject: mm: Clarify tlb_flush_pending barriers
From: Peter Zijlstra 
Date: Fri, 11 Aug 2017 16:04:50 +0200

Better document the ordering around tlb_flush_pending.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/mm_types.h |   78 +++
 1 file changed, 45 insertions(+), 33 deletions(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -526,30 +526,6 @@ extern void tlb_gather_mmu(struct mmu_ga
 extern void tlb_finish_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end);
 
-/*
- * Memory barriers to keep this state in sync are graciously provided by
- * the page table locks, outside of which no page table modifications happen.
- * The barriers are used to ensure the order between tlb_flush_pending updates,
- * which happen while the lock is not taken, and the PTE updates, which happen
- * while the lock is taken, are serialized.
- */
-static inline bool mm_tlb_flush_pending(struct 

Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-14 Thread Peter Zijlstra
On Mon, Aug 14, 2017 at 12:09:14PM +0900, Minchan Kim wrote:
> @@ -446,9 +450,7 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
>*
>*/
>   bool force = mm_tlb_flush_nested(tlb->mm);
> -
>   arch_tlb_finish_mmu(tlb, start, end, force);
> - dec_tlb_flush_pending(tlb->mm);
>  }

No, I think this breaks all the mm_tlb_flush_pending() users. They need
the decrement to not be visible until the TLB flush is complete.


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-14 Thread Peter Zijlstra
On Mon, Aug 14, 2017 at 12:09:14PM +0900, Minchan Kim wrote:
> @@ -446,9 +450,7 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
>*
>*/
>   bool force = mm_tlb_flush_nested(tlb->mm);
> -
>   arch_tlb_finish_mmu(tlb, start, end, force);
> - dec_tlb_flush_pending(tlb->mm);
>  }

No, I think this breaks all the mm_tlb_flush_pending() users. They need
the decrement to not be visible until the TLB flush is complete.


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-14 Thread Minchan Kim
Hi Nadav,

On Mon, Aug 14, 2017 at 05:07:19AM +, Nadav Amit wrote:
< snip >

> For some reason (I would assume intentional), all the examples here first
> “do not modify” the PTE, and then modify it - which is not an “interesting”
> case. However, based on what I understand on the memory barriers, I think
> there is indeed a missing barrier before reading it in
> mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case,

memory-barrier.txt always scares me. I have read it for a while
and IIUC, it seems semantic of spin_unlock(_pte) would be
enough without some memory-barrier inside mm_tlb_flush_nested.

I would be missing something totally.

Could you explain what kinds of sequence you have in mind to
have such problem?

Thanks.


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-14 Thread Minchan Kim
Hi Nadav,

On Mon, Aug 14, 2017 at 05:07:19AM +, Nadav Amit wrote:
< snip >

> For some reason (I would assume intentional), all the examples here first
> “do not modify” the PTE, and then modify it - which is not an “interesting”
> case. However, based on what I understand on the memory barriers, I think
> there is indeed a missing barrier before reading it in
> mm_tlb_flush_nested(). IIUC using smp_mb__after_unlock_lock() in this case,

memory-barrier.txt always scares me. I have read it for a while
and IIUC, it seems semantic of spin_unlock(_pte) would be
enough without some memory-barrier inside mm_tlb_flush_nested.

I would be missing something totally.

Could you explain what kinds of sequence you have in mind to
have such problem?

Thanks.


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Minchan Kim
On Mon, Aug 14, 2017 at 05:07:19AM +, Nadav Amit wrote:
< snip >

> Minchan, as for the solution you proposed, it seems to open again a race,
> since the “pending” indication is removed before the actual TLB flush is
> performed.

Oops, you're right!


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Minchan Kim
On Mon, Aug 14, 2017 at 05:07:19AM +, Nadav Amit wrote:
< snip >

> Minchan, as for the solution you proposed, it seems to open again a race,
> since the “pending” indication is removed before the actual TLB flush is
> performed.

Oops, you're right!


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Nadav Amit
Minchan Kim  wrote:

> On Sun, Aug 13, 2017 at 02:50:19PM +0200, Peter Zijlstra wrote:
>> On Sun, Aug 13, 2017 at 06:06:32AM +, Nadav Amit wrote:
 however mm_tlb_flush_nested() is a mystery, it appears to care about
 anything inside the range. For now rely on it doing at least _a_ PTL
 lock instead of taking  _the_ PTL lock.
>>> 
>>> It does not care about “anything” inside the range, but only on situations
>>> in which there is at least one (same) PT that was modified by one core and
>>> then read by the other. So, yes, it will always be _the_ same PTL, and not
>>> _a_ PTL - in the cases that flush is really needed.
>>> 
>>> The issue that might require additional barriers is that
>>> inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
>>> not held. IIUC, since the release-acquire might not behave as a full memory
>>> barrier, this requires an explicit memory barrier.
>> 
>> So I'm not entirely clear about this yet.
>> 
>> How about:
>> 
>> 
>>  CPU0CPU1
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  no mod
>>  unlock PTLn
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLm
>>  mod
>>  include in tlb range
>>  unlock PTLm
>> 
>>  lock PTLn
>>  mod
>>  unlock PTLn
>> 
>>  tlb_finish_mmu()
>>force = mm_tlb_flush_nested(tlb->mm);
>>arch_tlb_finish_mmu(force);
>> 
>> 
>>  ... more ...
>> 
>>  tlb_finish_mmu()
>> 
>> 
>> 
>> In this case you also want CPU1's mm_tlb_flush_nested() call to return
>> true, right?
> 
> No, because CPU 1 mofified pte and added it into tlb range
> so regardless of nested, it will flush TLB so there is no stale
> TLB problem.
> 
>> But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu()
>> you're not guaranteed CPU1 sees the increment. The only way to do that
>> is to make the PTL locks RCsc and that is a much more expensive
>> proposition.
>> 
>> 
>> What about:
>> 
>> 
>>  CPU0CPU1
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  no mod
>>  unlock PTLn
>> 
>> 
>>  lock PTLm
>>  mod
>>  include in tlb range
>>  unlock PTLm
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  mod
>>  unlock PTLn
>> 
>>  tlb_finish_mmu()
>>force = mm_tlb_flush_nested(tlb->mm);
>>arch_tlb_finish_mmu(force);
>> 
>> 
>>  ... more ...
>> 
>>  tlb_finish_mmu()
>> 
>> Do we want CPU1 to see it here? If so, where does it end?
> 
> Ditto. Since CPU 1 has added range, it will flush TLB regardless
> of nested condition.
> 
>> CPU0 CPU1
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  no mod
>>  unlock PTLn
>> 
>> 
>>  lock PTLm
>>  mod
>>  include in tlb range
>>  unlock PTLm
>> 
>>  tlb_finish_mmu()
>>force = mm_tlb_flush_nested(tlb->mm);
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  mod
>>  unlock PTLn
>> 
>>arch_tlb_finish_mmu(force);
>> 
>> 
>>  ... more ...
>> 
>>  tlb_finish_mmu()
>> 
>> 
>> This?
>> 
>> 
>> Could you clarify under what exact condition mm_tlb_flush_nested() must
>> return true?
> 
> mm_tlb_flush_nested aims for the CPU side where there is no pte update
> but need TLB flush.
> As I wrote 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dmm-26m-3D150267398226529-26w-3D2=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=x9zhXCtCLvTDtvE65-BGSA=v2Z7eDi7z1H9zdngcjZvlNeBudWzA9KvcXFNpU2A77s=amaSu_gurmBHHPcl3Pxfdl0Tk_uTnmf60tMQAsNDHVU=
>  ,
> it has stable TLB problem if we don't flush TLB although there is no
> pte modification.

To clarify: the main problem that these patches address is when the first
CPU updates the PTE, and second CPU sees the updated value and thinks: “the
PTE is already what I wanted - no flush is needed”.

For some reason (I would assume intentional), all the 

Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Nadav Amit
Minchan Kim  wrote:

> On Sun, Aug 13, 2017 at 02:50:19PM +0200, Peter Zijlstra wrote:
>> On Sun, Aug 13, 2017 at 06:06:32AM +, Nadav Amit wrote:
 however mm_tlb_flush_nested() is a mystery, it appears to care about
 anything inside the range. For now rely on it doing at least _a_ PTL
 lock instead of taking  _the_ PTL lock.
>>> 
>>> It does not care about “anything” inside the range, but only on situations
>>> in which there is at least one (same) PT that was modified by one core and
>>> then read by the other. So, yes, it will always be _the_ same PTL, and not
>>> _a_ PTL - in the cases that flush is really needed.
>>> 
>>> The issue that might require additional barriers is that
>>> inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
>>> not held. IIUC, since the release-acquire might not behave as a full memory
>>> barrier, this requires an explicit memory barrier.
>> 
>> So I'm not entirely clear about this yet.
>> 
>> How about:
>> 
>> 
>>  CPU0CPU1
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  no mod
>>  unlock PTLn
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLm
>>  mod
>>  include in tlb range
>>  unlock PTLm
>> 
>>  lock PTLn
>>  mod
>>  unlock PTLn
>> 
>>  tlb_finish_mmu()
>>force = mm_tlb_flush_nested(tlb->mm);
>>arch_tlb_finish_mmu(force);
>> 
>> 
>>  ... more ...
>> 
>>  tlb_finish_mmu()
>> 
>> 
>> 
>> In this case you also want CPU1's mm_tlb_flush_nested() call to return
>> true, right?
> 
> No, because CPU 1 mofified pte and added it into tlb range
> so regardless of nested, it will flush TLB so there is no stale
> TLB problem.
> 
>> But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu()
>> you're not guaranteed CPU1 sees the increment. The only way to do that
>> is to make the PTL locks RCsc and that is a much more expensive
>> proposition.
>> 
>> 
>> What about:
>> 
>> 
>>  CPU0CPU1
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  no mod
>>  unlock PTLn
>> 
>> 
>>  lock PTLm
>>  mod
>>  include in tlb range
>>  unlock PTLm
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  mod
>>  unlock PTLn
>> 
>>  tlb_finish_mmu()
>>force = mm_tlb_flush_nested(tlb->mm);
>>arch_tlb_finish_mmu(force);
>> 
>> 
>>  ... more ...
>> 
>>  tlb_finish_mmu()
>> 
>> Do we want CPU1 to see it here? If so, where does it end?
> 
> Ditto. Since CPU 1 has added range, it will flush TLB regardless
> of nested condition.
> 
>> CPU0 CPU1
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  no mod
>>  unlock PTLn
>> 
>> 
>>  lock PTLm
>>  mod
>>  include in tlb range
>>  unlock PTLm
>> 
>>  tlb_finish_mmu()
>>force = mm_tlb_flush_nested(tlb->mm);
>> 
>>  tlb_gather_mmu()
>> 
>>  lock PTLn
>>  mod
>>  unlock PTLn
>> 
>>arch_tlb_finish_mmu(force);
>> 
>> 
>>  ... more ...
>> 
>>  tlb_finish_mmu()
>> 
>> 
>> This?
>> 
>> 
>> Could you clarify under what exact condition mm_tlb_flush_nested() must
>> return true?
> 
> mm_tlb_flush_nested aims for the CPU side where there is no pte update
> but need TLB flush.
> As I wrote 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dmm-26m-3D150267398226529-26w-3D2=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=x9zhXCtCLvTDtvE65-BGSA=v2Z7eDi7z1H9zdngcjZvlNeBudWzA9KvcXFNpU2A77s=amaSu_gurmBHHPcl3Pxfdl0Tk_uTnmf60tMQAsNDHVU=
>  ,
> it has stable TLB problem if we don't flush TLB although there is no
> pte modification.

To clarify: the main problem that these patches address is when the first
CPU updates the PTE, and second CPU sees the updated value and thinks: “the
PTE is already what I wanted - no flush is needed”.

For some reason (I would assume intentional), all the examples here first
“do 

Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Minchan Kim
On Sun, Aug 13, 2017 at 02:50:19PM +0200, Peter Zijlstra wrote:
> On Sun, Aug 13, 2017 at 06:06:32AM +, Nadav Amit wrote:
> > > however mm_tlb_flush_nested() is a mystery, it appears to care about
> > > anything inside the range. For now rely on it doing at least _a_ PTL
> > > lock instead of taking  _the_ PTL lock.
> > 
> > It does not care about “anything” inside the range, but only on situations
> > in which there is at least one (same) PT that was modified by one core and
> > then read by the other. So, yes, it will always be _the_ same PTL, and not
> > _a_ PTL - in the cases that flush is really needed.
> > 
> > The issue that might require additional barriers is that
> > inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
> > not held. IIUC, since the release-acquire might not behave as a full memory
> > barrier, this requires an explicit memory barrier.
> 
> So I'm not entirely clear about this yet.
> 
> How about:
> 
> 
>   CPU0CPU1
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   no mod
>   unlock PTLn
> 
>   tlb_gather_mmu()
> 
>   lock PTLm
>   mod
>   include in tlb range
>   unlock PTLm
> 
>   lock PTLn
>   mod
>   unlock PTLn
> 
>   tlb_finish_mmu()
> force = mm_tlb_flush_nested(tlb->mm);
> arch_tlb_finish_mmu(force);
> 
> 
>   ... more ...
> 
>   tlb_finish_mmu()
> 
> 
> 
> In this case you also want CPU1's mm_tlb_flush_nested() call to return
> true, right?

No, because CPU 1 mofified pte and added it into tlb range
so regardless of nested, it will flush TLB so there is no stale
TLB problem.

> 
> But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu()
> you're not guaranteed CPU1 sees the increment. The only way to do that
> is to make the PTL locks RCsc and that is a much more expensive
> proposition.
> 
> 
> What about:
> 
> 
>   CPU0CPU1
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   no mod
>   unlock PTLn
> 
> 
>   lock PTLm
>   mod
>   include in tlb range
>   unlock PTLm
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   mod
>   unlock PTLn
> 
>   tlb_finish_mmu()
> force = mm_tlb_flush_nested(tlb->mm);
> arch_tlb_finish_mmu(force);
> 
> 
>   ... more ...
> 
>   tlb_finish_mmu()
> 
> Do we want CPU1 to see it here? If so, where does it end?

Ditto. Since CPU 1 has added range, it will flush TLB regardless
of nested condition.

> 
>   CPU0CPU1
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   no mod
>   unlock PTLn
> 
> 
>   lock PTLm
>   mod
>   include in tlb range
>   unlock PTLm
> 
>   tlb_finish_mmu()
> force = mm_tlb_flush_nested(tlb->mm);
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   mod
>   unlock PTLn
> 
> arch_tlb_finish_mmu(force);
> 
> 
>   ... more ...
> 
>   tlb_finish_mmu()
> 
> 
> This?
> 
> 
> Could you clarify under what exact condition mm_tlb_flush_nested() must
> return true?

mm_tlb_flush_nested aims for the CPU side where there is no pte update
but need TLB flush.
As I wrote https://marc.info/?l=linux-mm=150267398226529=2,
it has stable TLB problem if we don't flush TLB although there is no
pte modification.


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Minchan Kim
On Sun, Aug 13, 2017 at 02:50:19PM +0200, Peter Zijlstra wrote:
> On Sun, Aug 13, 2017 at 06:06:32AM +, Nadav Amit wrote:
> > > however mm_tlb_flush_nested() is a mystery, it appears to care about
> > > anything inside the range. For now rely on it doing at least _a_ PTL
> > > lock instead of taking  _the_ PTL lock.
> > 
> > It does not care about “anything” inside the range, but only on situations
> > in which there is at least one (same) PT that was modified by one core and
> > then read by the other. So, yes, it will always be _the_ same PTL, and not
> > _a_ PTL - in the cases that flush is really needed.
> > 
> > The issue that might require additional barriers is that
> > inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
> > not held. IIUC, since the release-acquire might not behave as a full memory
> > barrier, this requires an explicit memory barrier.
> 
> So I'm not entirely clear about this yet.
> 
> How about:
> 
> 
>   CPU0CPU1
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   no mod
>   unlock PTLn
> 
>   tlb_gather_mmu()
> 
>   lock PTLm
>   mod
>   include in tlb range
>   unlock PTLm
> 
>   lock PTLn
>   mod
>   unlock PTLn
> 
>   tlb_finish_mmu()
> force = mm_tlb_flush_nested(tlb->mm);
> arch_tlb_finish_mmu(force);
> 
> 
>   ... more ...
> 
>   tlb_finish_mmu()
> 
> 
> 
> In this case you also want CPU1's mm_tlb_flush_nested() call to return
> true, right?

No, because CPU 1 mofified pte and added it into tlb range
so regardless of nested, it will flush TLB so there is no stale
TLB problem.

> 
> But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu()
> you're not guaranteed CPU1 sees the increment. The only way to do that
> is to make the PTL locks RCsc and that is a much more expensive
> proposition.
> 
> 
> What about:
> 
> 
>   CPU0CPU1
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   no mod
>   unlock PTLn
> 
> 
>   lock PTLm
>   mod
>   include in tlb range
>   unlock PTLm
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   mod
>   unlock PTLn
> 
>   tlb_finish_mmu()
> force = mm_tlb_flush_nested(tlb->mm);
> arch_tlb_finish_mmu(force);
> 
> 
>   ... more ...
> 
>   tlb_finish_mmu()
> 
> Do we want CPU1 to see it here? If so, where does it end?

Ditto. Since CPU 1 has added range, it will flush TLB regardless
of nested condition.

> 
>   CPU0CPU1
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   no mod
>   unlock PTLn
> 
> 
>   lock PTLm
>   mod
>   include in tlb range
>   unlock PTLm
> 
>   tlb_finish_mmu()
> force = mm_tlb_flush_nested(tlb->mm);
> 
>   tlb_gather_mmu()
> 
>   lock PTLn
>   mod
>   unlock PTLn
> 
> arch_tlb_finish_mmu(force);
> 
> 
>   ... more ...
> 
>   tlb_finish_mmu()
> 
> 
> This?
> 
> 
> Could you clarify under what exact condition mm_tlb_flush_nested() must
> return true?

mm_tlb_flush_nested aims for the CPU side where there is no pte update
but need TLB flush.
As I wrote https://marc.info/?l=linux-mm=150267398226529=2,
it has stable TLB problem if we don't flush TLB although there is no
pte modification.


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Minchan Kim
Hi Peter,

On Fri, Aug 11, 2017 at 04:04:50PM +0200, Peter Zijlstra wrote:
> 
> Ok, so I have the below to still go on-top.
> 
> Ideally someone would clarify the situation around
> mm_tlb_flush_nested(), because ideally we'd remove the
> smp_mb__after_atomic() and go back to relying on PTL alone.
> 
> This also removes the pointless smp_mb__before_atomic()

I'm not an expert of barrier stuff but IIUC, mm_tlb_flush_nested's
side full memory barrier can go with removing smp_mb__after_atomic
in inc_tlb_flush_pending side?


diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 490af494c2da..5ad0e66df363 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -544,7 +544,12 @@ static inline bool mm_tlb_flush_pending(struct mm_struct 
*mm)
  */
 static inline bool mm_tlb_flush_nested(struct mm_struct *mm)
 {
-   return atomic_read(>tlb_flush_pending) > 1;
+   /*
+* atomic_dec_and_test's full memory barrier guarantees
+* to see uptodate tlb_flush_pending count in other CPU
+* without relying on page table lock.
+*/
+   return !atomic_dec_and_test(>tlb_flush_pending);
 }
 
 static inline void init_tlb_flush_pending(struct mm_struct *mm)
diff --git a/mm/memory.c b/mm/memory.c
index f571b0eb9816..e90b57bc65fb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -407,6 +407,10 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct 
mm_struct *mm,
unsigned long start, unsigned long end)
 {
arch_tlb_gather_mmu(tlb, mm, start, end);
+   /*
+* couterpart is mm_tlb_flush_nested in tlb_finish_mmu
+* which decreases pending count.
+*/
inc_tlb_flush_pending(tlb->mm);
 }
 
@@ -446,9 +450,7 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
 *
 */
bool force = mm_tlb_flush_nested(tlb->mm);
-
arch_tlb_finish_mmu(tlb, start, end, force);
-   dec_tlb_flush_pending(tlb->mm);
 }
 
 /*


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Minchan Kim
Hi Peter,

On Fri, Aug 11, 2017 at 04:04:50PM +0200, Peter Zijlstra wrote:
> 
> Ok, so I have the below to still go on-top.
> 
> Ideally someone would clarify the situation around
> mm_tlb_flush_nested(), because ideally we'd remove the
> smp_mb__after_atomic() and go back to relying on PTL alone.
> 
> This also removes the pointless smp_mb__before_atomic()

I'm not an expert of barrier stuff but IIUC, mm_tlb_flush_nested's
side full memory barrier can go with removing smp_mb__after_atomic
in inc_tlb_flush_pending side?


diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 490af494c2da..5ad0e66df363 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -544,7 +544,12 @@ static inline bool mm_tlb_flush_pending(struct mm_struct 
*mm)
  */
 static inline bool mm_tlb_flush_nested(struct mm_struct *mm)
 {
-   return atomic_read(>tlb_flush_pending) > 1;
+   /*
+* atomic_dec_and_test's full memory barrier guarantees
+* to see uptodate tlb_flush_pending count in other CPU
+* without relying on page table lock.
+*/
+   return !atomic_dec_and_test(>tlb_flush_pending);
 }
 
 static inline void init_tlb_flush_pending(struct mm_struct *mm)
diff --git a/mm/memory.c b/mm/memory.c
index f571b0eb9816..e90b57bc65fb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -407,6 +407,10 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct 
mm_struct *mm,
unsigned long start, unsigned long end)
 {
arch_tlb_gather_mmu(tlb, mm, start, end);
+   /*
+* couterpart is mm_tlb_flush_nested in tlb_finish_mmu
+* which decreases pending count.
+*/
inc_tlb_flush_pending(tlb->mm);
 }
 
@@ -446,9 +450,7 @@ void tlb_finish_mmu(struct mmu_gather *tlb,
 *
 */
bool force = mm_tlb_flush_nested(tlb->mm);
-
arch_tlb_finish_mmu(tlb, start, end, force);
-   dec_tlb_flush_pending(tlb->mm);
 }
 
 /*


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Peter Zijlstra
On Sun, Aug 13, 2017 at 06:06:32AM +, Nadav Amit wrote:
> > however mm_tlb_flush_nested() is a mystery, it appears to care about
> > anything inside the range. For now rely on it doing at least _a_ PTL
> > lock instead of taking  _the_ PTL lock.
> 
> It does not care about “anything” inside the range, but only on situations
> in which there is at least one (same) PT that was modified by one core and
> then read by the other. So, yes, it will always be _the_ same PTL, and not
> _a_ PTL - in the cases that flush is really needed.
> 
> The issue that might require additional barriers is that
> inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
> not held. IIUC, since the release-acquire might not behave as a full memory
> barrier, this requires an explicit memory barrier.

So I'm not entirely clear about this yet.

How about:


CPU0CPU1

tlb_gather_mmu()

lock PTLn
no mod
unlock PTLn

tlb_gather_mmu()

lock PTLm
mod
include in tlb range
unlock PTLm

lock PTLn
mod
unlock PTLn

tlb_finish_mmu()
  force = mm_tlb_flush_nested(tlb->mm);
  arch_tlb_finish_mmu(force);


... more ...

tlb_finish_mmu()



In this case you also want CPU1's mm_tlb_flush_nested() call to return
true, right?

But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu()
you're not guaranteed CPU1 sees the increment. The only way to do that
is to make the PTL locks RCsc and that is a much more expensive
proposition.


What about:


CPU0CPU1

tlb_gather_mmu()

lock PTLn
no mod
unlock PTLn


lock PTLm
mod
include in tlb range
unlock PTLm

tlb_gather_mmu()

lock PTLn
mod
unlock PTLn

tlb_finish_mmu()
  force = mm_tlb_flush_nested(tlb->mm);
  arch_tlb_finish_mmu(force);


... more ...

tlb_finish_mmu()

Do we want CPU1 to see it here? If so, where does it end?


CPU0CPU1

tlb_gather_mmu()

lock PTLn
no mod
unlock PTLn


lock PTLm
mod
include in tlb range
unlock PTLm

tlb_finish_mmu()
  force = mm_tlb_flush_nested(tlb->mm);

tlb_gather_mmu()

lock PTLn
mod
unlock PTLn

  arch_tlb_finish_mmu(force);


... more ...

tlb_finish_mmu()


This?


Could you clarify under what exact condition mm_tlb_flush_nested() must
return true?


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Peter Zijlstra
On Sun, Aug 13, 2017 at 06:06:32AM +, Nadav Amit wrote:
> > however mm_tlb_flush_nested() is a mystery, it appears to care about
> > anything inside the range. For now rely on it doing at least _a_ PTL
> > lock instead of taking  _the_ PTL lock.
> 
> It does not care about “anything” inside the range, but only on situations
> in which there is at least one (same) PT that was modified by one core and
> then read by the other. So, yes, it will always be _the_ same PTL, and not
> _a_ PTL - in the cases that flush is really needed.
> 
> The issue that might require additional barriers is that
> inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
> not held. IIUC, since the release-acquire might not behave as a full memory
> barrier, this requires an explicit memory barrier.

So I'm not entirely clear about this yet.

How about:


CPU0CPU1

tlb_gather_mmu()

lock PTLn
no mod
unlock PTLn

tlb_gather_mmu()

lock PTLm
mod
include in tlb range
unlock PTLm

lock PTLn
mod
unlock PTLn

tlb_finish_mmu()
  force = mm_tlb_flush_nested(tlb->mm);
  arch_tlb_finish_mmu(force);


... more ...

tlb_finish_mmu()



In this case you also want CPU1's mm_tlb_flush_nested() call to return
true, right?

But even with an smp_mb__after_atomic() at CPU0's tlg_bather_mmu()
you're not guaranteed CPU1 sees the increment. The only way to do that
is to make the PTL locks RCsc and that is a much more expensive
proposition.


What about:


CPU0CPU1

tlb_gather_mmu()

lock PTLn
no mod
unlock PTLn


lock PTLm
mod
include in tlb range
unlock PTLm

tlb_gather_mmu()

lock PTLn
mod
unlock PTLn

tlb_finish_mmu()
  force = mm_tlb_flush_nested(tlb->mm);
  arch_tlb_finish_mmu(force);


... more ...

tlb_finish_mmu()

Do we want CPU1 to see it here? If so, where does it end?


CPU0CPU1

tlb_gather_mmu()

lock PTLn
no mod
unlock PTLn


lock PTLm
mod
include in tlb range
unlock PTLm

tlb_finish_mmu()
  force = mm_tlb_flush_nested(tlb->mm);

tlb_gather_mmu()

lock PTLn
mod
unlock PTLn

  arch_tlb_finish_mmu(force);


... more ...

tlb_finish_mmu()


This?


Could you clarify under what exact condition mm_tlb_flush_nested() must
return true?


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Nadav Amit
Peter Zijlstra  wrote:

> 
> Ok, so I have the below to still go on-top.
> 
> Ideally someone would clarify the situation around
> mm_tlb_flush_nested(), because ideally we'd remove the
> smp_mb__after_atomic() and go back to relying on PTL alone.
> 
> This also removes the pointless smp_mb__before_atomic()
> 
> ---
> Subject: mm: Fix barriers for the tlb_flush_pending thing
> From: Peter Zijlstra 
> Date: Fri Aug 11 12:43:33 CEST 2017
> 
> I'm not 100% sure we always care about the same PTL and when we have
> SPLIT_PTE_PTLOCKS and have RCpc locks (PPC) the UNLOCK of one does not
> in fact order against the LOCK of another lock. Therefore the
> documented scheme does not work if we care about multiple PTLs
> 
> mm_tlb_flush_pending() appears to only care about a single PTL:
> 
> - arch pte_accessible() (x86, arm64) only cares about that one PTE.
> - do_huge_pmd_numa_page() also only cares about a single (huge) page.
> - ksm write_protect_page() also only cares about a single page.
> 
> however mm_tlb_flush_nested() is a mystery, it appears to care about
> anything inside the range. For now rely on it doing at least _a_ PTL
> lock instead of taking  _the_ PTL lock.

It does not care about “anything” inside the range, but only on situations
in which there is at least one (same) PT that was modified by one core and
then read by the other. So, yes, it will always be _the_ same PTL, and not
_a_ PTL - in the cases that flush is really needed.

The issue that might require additional barriers is that
inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
not held. IIUC, since the release-acquire might not behave as a full memory
barrier, this requires an explicit memory barrier.

> Therefore add an explicit smp_mb__after_atomic() to cure things.
> 
> Also remove the smp_mb__before_atomic() on the dec side, as its
> completely pointless. We must rely on flush_tlb_range() to DTRT.

Good. It seemed fishy to me, but I was focused on the TLB consistency and
less on the barriers (that’s my excuse).

Nadav



Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-13 Thread Nadav Amit
Peter Zijlstra  wrote:

> 
> Ok, so I have the below to still go on-top.
> 
> Ideally someone would clarify the situation around
> mm_tlb_flush_nested(), because ideally we'd remove the
> smp_mb__after_atomic() and go back to relying on PTL alone.
> 
> This also removes the pointless smp_mb__before_atomic()
> 
> ---
> Subject: mm: Fix barriers for the tlb_flush_pending thing
> From: Peter Zijlstra 
> Date: Fri Aug 11 12:43:33 CEST 2017
> 
> I'm not 100% sure we always care about the same PTL and when we have
> SPLIT_PTE_PTLOCKS and have RCpc locks (PPC) the UNLOCK of one does not
> in fact order against the LOCK of another lock. Therefore the
> documented scheme does not work if we care about multiple PTLs
> 
> mm_tlb_flush_pending() appears to only care about a single PTL:
> 
> - arch pte_accessible() (x86, arm64) only cares about that one PTE.
> - do_huge_pmd_numa_page() also only cares about a single (huge) page.
> - ksm write_protect_page() also only cares about a single page.
> 
> however mm_tlb_flush_nested() is a mystery, it appears to care about
> anything inside the range. For now rely on it doing at least _a_ PTL
> lock instead of taking  _the_ PTL lock.

It does not care about “anything” inside the range, but only on situations
in which there is at least one (same) PT that was modified by one core and
then read by the other. So, yes, it will always be _the_ same PTL, and not
_a_ PTL - in the cases that flush is really needed.

The issue that might require additional barriers is that
inc_tlb_flush_pending() and mm_tlb_flush_nested() are called when the PTL is
not held. IIUC, since the release-acquire might not behave as a full memory
barrier, this requires an explicit memory barrier.

> Therefore add an explicit smp_mb__after_atomic() to cure things.
> 
> Also remove the smp_mb__before_atomic() on the dec side, as its
> completely pointless. We must rely on flush_tlb_range() to DTRT.

Good. It seemed fishy to me, but I was focused on the TLB consistency and
less on the barriers (that’s my excuse).

Nadav



Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Peter Zijlstra

Ok, so I have the below to still go on-top.

Ideally someone would clarify the situation around
mm_tlb_flush_nested(), because ideally we'd remove the
smp_mb__after_atomic() and go back to relying on PTL alone.

This also removes the pointless smp_mb__before_atomic()

---
Subject: mm: Fix barriers for the tlb_flush_pending thing
From: Peter Zijlstra 
Date: Fri Aug 11 12:43:33 CEST 2017

I'm not 100% sure we always care about the same PTL and when we have
SPLIT_PTE_PTLOCKS and have RCpc locks (PPC) the UNLOCK of one does not
in fact order against the LOCK of another lock. Therefore the
documented scheme does not work if we care about multiple PTLs

mm_tlb_flush_pending() appears to only care about a single PTL:

 - arch pte_accessible() (x86, arm64) only cares about that one PTE.
 - do_huge_pmd_numa_page() also only cares about a single (huge) page.
 - ksm write_protect_page() also only cares about a single page.

however mm_tlb_flush_nested() is a mystery, it appears to care about
anything inside the range. For now rely on it doing at least _a_ PTL
lock instead of taking  _the_ PTL lock.

Therefore add an explicit smp_mb__after_atomic() to cure things.

Also remove the smp_mb__before_atomic() on the dec side, as its
completely pointless. We must rely on flush_tlb_range() to DTRT.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/mm_types.h |   38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -537,13 +537,13 @@ static inline bool mm_tlb_flush_pending(
 {
/*
 * Must be called with PTL held; such that our PTL acquire will have
-* observed the store from set_tlb_flush_pending().
+* observed the increment from inc_tlb_flush_pending().
 */
-   return atomic_read(>tlb_flush_pending) > 0;
+   return atomic_read(>tlb_flush_pending);
 }
 
 /*
- * Returns true if there are two above TLB batching threads in parallel.
+ * Returns true if there are two or more TLB batching threads in parallel.
  */
 static inline bool mm_tlb_flush_nested(struct mm_struct *mm)
 {
@@ -558,15 +558,12 @@ static inline void init_tlb_flush_pendin
 static inline void inc_tlb_flush_pending(struct mm_struct *mm)
 {
atomic_inc(>tlb_flush_pending);
-
/*
-* The only time this value is relevant is when there are indeed pages
-* to flush. And we'll only flush pages after changing them, which
-* requires the PTL.
-*
 * So the ordering here is:
 *
 *  atomic_inc(>tlb_flush_pending);
+*  smp_mb__after_atomic();
+*
 *  spin_lock();
 *  ...
 *  set_pte_at();
@@ -580,21 +577,30 @@ static inline void inc_tlb_flush_pending
 *  flush_tlb_range();
 *  atomic_dec(>tlb_flush_pending);
 *
-* So the =true store is constrained by the PTL unlock, and the =false
-* store is constrained by the TLB invalidate.
+* Where we order the increment against the PTE modification with the
+* smp_mb__after_atomic(). It would appear that the spin_unlock()
+* is sufficient to constrain the inc, because we only care about the
+* value if there is indeed a pending PTE modification. However with
+* SPLIT_PTE_PTLOCKS and RCpc locks (PPC) the UNLOCK of one lock does
+* not order against the LOCK of another lock.
+*
+* The decrement is ordered by the flush_tlb_range(), such that
+* mm_tlb_flush_pending() will not return false unless all flushes have
+* completed.
 */
+   smp_mb__after_atomic();
 }
 
-/* Clearing is done after a TLB flush, which also provides a barrier. */
 static inline void dec_tlb_flush_pending(struct mm_struct *mm)
 {
/*
-* Guarantee that the tlb_flush_pending does not not leak into the
-* critical section, since we must order the PTE change and changes to
-* the pending TLB flush indication. We could have relied on TLB flush
-* as a memory barrier, but this behavior is not clearly documented.
+* See inc_tlb_flush_pending().
+*
+* This cannot be smp_mb__before_atomic() because smp_mb() simply does
+* not order against TLB invalidate completion, which is what we need.
+*
+* Therefore we must rely on tlb_flush_*() to guarantee order.
 */
-   smp_mb__before_atomic();
atomic_dec(>tlb_flush_pending);
 }
 


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Peter Zijlstra

Ok, so I have the below to still go on-top.

Ideally someone would clarify the situation around
mm_tlb_flush_nested(), because ideally we'd remove the
smp_mb__after_atomic() and go back to relying on PTL alone.

This also removes the pointless smp_mb__before_atomic()

---
Subject: mm: Fix barriers for the tlb_flush_pending thing
From: Peter Zijlstra 
Date: Fri Aug 11 12:43:33 CEST 2017

I'm not 100% sure we always care about the same PTL and when we have
SPLIT_PTE_PTLOCKS and have RCpc locks (PPC) the UNLOCK of one does not
in fact order against the LOCK of another lock. Therefore the
documented scheme does not work if we care about multiple PTLs

mm_tlb_flush_pending() appears to only care about a single PTL:

 - arch pte_accessible() (x86, arm64) only cares about that one PTE.
 - do_huge_pmd_numa_page() also only cares about a single (huge) page.
 - ksm write_protect_page() also only cares about a single page.

however mm_tlb_flush_nested() is a mystery, it appears to care about
anything inside the range. For now rely on it doing at least _a_ PTL
lock instead of taking  _the_ PTL lock.

Therefore add an explicit smp_mb__after_atomic() to cure things.

Also remove the smp_mb__before_atomic() on the dec side, as its
completely pointless. We must rely on flush_tlb_range() to DTRT.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/mm_types.h |   38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -537,13 +537,13 @@ static inline bool mm_tlb_flush_pending(
 {
/*
 * Must be called with PTL held; such that our PTL acquire will have
-* observed the store from set_tlb_flush_pending().
+* observed the increment from inc_tlb_flush_pending().
 */
-   return atomic_read(>tlb_flush_pending) > 0;
+   return atomic_read(>tlb_flush_pending);
 }
 
 /*
- * Returns true if there are two above TLB batching threads in parallel.
+ * Returns true if there are two or more TLB batching threads in parallel.
  */
 static inline bool mm_tlb_flush_nested(struct mm_struct *mm)
 {
@@ -558,15 +558,12 @@ static inline void init_tlb_flush_pendin
 static inline void inc_tlb_flush_pending(struct mm_struct *mm)
 {
atomic_inc(>tlb_flush_pending);
-
/*
-* The only time this value is relevant is when there are indeed pages
-* to flush. And we'll only flush pages after changing them, which
-* requires the PTL.
-*
 * So the ordering here is:
 *
 *  atomic_inc(>tlb_flush_pending);
+*  smp_mb__after_atomic();
+*
 *  spin_lock();
 *  ...
 *  set_pte_at();
@@ -580,21 +577,30 @@ static inline void inc_tlb_flush_pending
 *  flush_tlb_range();
 *  atomic_dec(>tlb_flush_pending);
 *
-* So the =true store is constrained by the PTL unlock, and the =false
-* store is constrained by the TLB invalidate.
+* Where we order the increment against the PTE modification with the
+* smp_mb__after_atomic(). It would appear that the spin_unlock()
+* is sufficient to constrain the inc, because we only care about the
+* value if there is indeed a pending PTE modification. However with
+* SPLIT_PTE_PTLOCKS and RCpc locks (PPC) the UNLOCK of one lock does
+* not order against the LOCK of another lock.
+*
+* The decrement is ordered by the flush_tlb_range(), such that
+* mm_tlb_flush_pending() will not return false unless all flushes have
+* completed.
 */
+   smp_mb__after_atomic();
 }
 
-/* Clearing is done after a TLB flush, which also provides a barrier. */
 static inline void dec_tlb_flush_pending(struct mm_struct *mm)
 {
/*
-* Guarantee that the tlb_flush_pending does not not leak into the
-* critical section, since we must order the PTE change and changes to
-* the pending TLB flush indication. We could have relied on TLB flush
-* as a memory barrier, but this behavior is not clearly documented.
+* See inc_tlb_flush_pending().
+*
+* This cannot be smp_mb__before_atomic() because smp_mb() simply does
+* not order against TLB invalidate completion, which is what we need.
+*
+* Therefore we must rely on tlb_flush_*() to guarantee order.
 */
-   smp_mb__before_atomic();
atomic_dec(>tlb_flush_pending);
 }
 


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Stephen Rothwell
Hi Ingo,

On Fri, 11 Aug 2017 14:44:25 +0200 Ingo Molnar  wrote:
>
> * Peter Zijlstra  wrote:
> 
> > On Fri, Aug 11, 2017 at 01:56:07PM +0200, Ingo Molnar wrote:  
> > > I've done a minimal conflict resolution merge locally. Peter, could you 
> > > please 
> > > double check my resolution, in:
> > > 
> > >   040cca3ab2f6: Merge branch 'linus' into locking/core, to resolve 
> > > conflicts  
> > 
> > That merge is a bit wonky, but not terminally broken afaict.
> > 
> > It now does two TLB flushes, the below cleans that up.  
> 
> Cool, thanks - I've applied it as a separate commit, to reduce the evilness 
> of the 
> merge commit.
> 
> Will push it all out in time to make Stephen's Monday morning a bit less of a 
> Monday morning.

Thanks you very much.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Stephen Rothwell
Hi Ingo,

On Fri, 11 Aug 2017 14:44:25 +0200 Ingo Molnar  wrote:
>
> * Peter Zijlstra  wrote:
> 
> > On Fri, Aug 11, 2017 at 01:56:07PM +0200, Ingo Molnar wrote:  
> > > I've done a minimal conflict resolution merge locally. Peter, could you 
> > > please 
> > > double check my resolution, in:
> > > 
> > >   040cca3ab2f6: Merge branch 'linus' into locking/core, to resolve 
> > > conflicts  
> > 
> > That merge is a bit wonky, but not terminally broken afaict.
> > 
> > It now does two TLB flushes, the below cleans that up.  
> 
> Cool, thanks - I've applied it as a separate commit, to reduce the evilness 
> of the 
> merge commit.
> 
> Will push it all out in time to make Stephen's Monday morning a bit less of a 
> Monday morning.

Thanks you very much.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Fri, Aug 11, 2017 at 01:56:07PM +0200, Ingo Molnar wrote:
> > I've done a minimal conflict resolution merge locally. Peter, could you 
> > please 
> > double check my resolution, in:
> > 
> >   040cca3ab2f6: Merge branch 'linus' into locking/core, to resolve conflicts
> 
> That merge is a bit wonky, but not terminally broken afaict.
> 
> It now does two TLB flushes, the below cleans that up.

Cool, thanks - I've applied it as a separate commit, to reduce the evilness of 
the 
merge commit.

Will push it all out in time to make Stephen's Monday morning a bit less of a 
Monday morning.

Thanks,

Ingo


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Fri, Aug 11, 2017 at 01:56:07PM +0200, Ingo Molnar wrote:
> > I've done a minimal conflict resolution merge locally. Peter, could you 
> > please 
> > double check my resolution, in:
> > 
> >   040cca3ab2f6: Merge branch 'linus' into locking/core, to resolve conflicts
> 
> That merge is a bit wonky, but not terminally broken afaict.
> 
> It now does two TLB flushes, the below cleans that up.

Cool, thanks - I've applied it as a separate commit, to reduce the evilness of 
the 
merge commit.

Will push it all out in time to make Stephen's Monday morning a bit less of a 
Monday morning.

Thanks,

Ingo


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Peter Zijlstra
On Fri, Aug 11, 2017 at 01:56:07PM +0200, Ingo Molnar wrote:
> I've done a minimal conflict resolution merge locally. Peter, could you 
> please 
> double check my resolution, in:
> 
>   040cca3ab2f6: Merge branch 'linus' into locking/core, to resolve conflicts

That merge is a bit wonky, but not terminally broken afaict.

It now does two TLB flushes, the below cleans that up.

---
 mm/huge_memory.c | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ce883459e246..08f6c1993832 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1410,7 +1410,6 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
int page_nid = -1, this_nid = numa_node_id();
int target_nid, last_cpupid = -1;
-   bool need_flush = false;
bool page_locked;
bool migrated = false;
bool was_writable;
@@ -1497,22 +1496,18 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t 
pmd)
}
 
/*
-* The page_table_lock above provides a memory barrier
-* with change_protection_range.
-*/
-   if (mm_tlb_flush_pending(vma->vm_mm))
-   flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
-
-   /*
 * Since we took the NUMA fault, we must have observed the !accessible
 * bit. Make sure all other CPUs agree with that, to avoid them
 * modifying the page we're about to migrate.
 *
 * Must be done under PTL such that we'll observe the relevant
-* set_tlb_flush_pending().
+* inc_tlb_flush_pending().
+*
+* We are not sure a pending tlb flush here is for a huge page
+* mapping or not. Hence use the tlb range variant
 */
if (mm_tlb_flush_pending(vma->vm_mm))
-   need_flush = true;
+   flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
 
/*
 * Migrate the THP to the requested node, returns with page unlocked
@@ -1520,13 +1515,6 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t 
pmd)
 */
spin_unlock(vmf->ptl);
 
-   /*
-* We are not sure a pending tlb flush here is for a huge page
-* mapping or not. Hence use the tlb range variant
-*/
-   if (need_flush)
-   flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
-
migrated = migrate_misplaced_transhuge_page(vma->vm_mm, vma,
vmf->pmd, pmd, vmf->address, page, target_nid);
if (migrated) {


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Peter Zijlstra
On Fri, Aug 11, 2017 at 01:56:07PM +0200, Ingo Molnar wrote:
> I've done a minimal conflict resolution merge locally. Peter, could you 
> please 
> double check my resolution, in:
> 
>   040cca3ab2f6: Merge branch 'linus' into locking/core, to resolve conflicts

That merge is a bit wonky, but not terminally broken afaict.

It now does two TLB flushes, the below cleans that up.

---
 mm/huge_memory.c | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ce883459e246..08f6c1993832 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1410,7 +1410,6 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd)
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
int page_nid = -1, this_nid = numa_node_id();
int target_nid, last_cpupid = -1;
-   bool need_flush = false;
bool page_locked;
bool migrated = false;
bool was_writable;
@@ -1497,22 +1496,18 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t 
pmd)
}
 
/*
-* The page_table_lock above provides a memory barrier
-* with change_protection_range.
-*/
-   if (mm_tlb_flush_pending(vma->vm_mm))
-   flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
-
-   /*
 * Since we took the NUMA fault, we must have observed the !accessible
 * bit. Make sure all other CPUs agree with that, to avoid them
 * modifying the page we're about to migrate.
 *
 * Must be done under PTL such that we'll observe the relevant
-* set_tlb_flush_pending().
+* inc_tlb_flush_pending().
+*
+* We are not sure a pending tlb flush here is for a huge page
+* mapping or not. Hence use the tlb range variant
 */
if (mm_tlb_flush_pending(vma->vm_mm))
-   need_flush = true;
+   flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
 
/*
 * Migrate the THP to the requested node, returns with page unlocked
@@ -1520,13 +1515,6 @@ int do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t 
pmd)
 */
spin_unlock(vmf->ptl);
 
-   /*
-* We are not sure a pending tlb flush here is for a huge page
-* mapping or not. Hence use the tlb range variant
-*/
-   if (need_flush)
-   flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE);
-
migrated = migrate_misplaced_transhuge_page(vma->vm_mm, vma,
vmf->pmd, pmd, vmf->address, page, target_nid);
if (migrated) {


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Ingo Molnar

* Stephen Rothwell  wrote:

> Hi Peter,
> 
> On Fri, 11 Aug 2017 11:34:49 +0200 Peter Zijlstra  
> wrote:
> >
> > On Fri, Aug 11, 2017 at 05:53:26PM +1000, Stephen Rothwell wrote:
> > > 
> > > Today's linux-next merge of the akpm-current tree got conflicts in:
> > > 
> > >   include/linux/mm_types.h
> > >   mm/huge_memory.c
> > > 
> > > between commit:
> > > 
> > >   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> > > 
> > > from the tip tree and commits:
> > > 
> > >   16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
> > >   a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as 
> > > long as possible"")
> > > 
> > > from the akpm-current tree.
> > > 
> > > The latter 2 are now in Linus' tree as well (but were not when I started
> > > the day).
> >
> > Here's two patches that apply on top of tip.
> 
> What I will really need (on Monday) is a merge resolution between
> Linus' tree and the tip tree ...

I've done a minimal conflict resolution merge locally. Peter, could you please 
double check my resolution, in:

  040cca3ab2f6: Merge branch 'linus' into locking/core, to resolve conflicts

Thanks,

Ingo


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Ingo Molnar

* Stephen Rothwell  wrote:

> Hi Peter,
> 
> On Fri, 11 Aug 2017 11:34:49 +0200 Peter Zijlstra  
> wrote:
> >
> > On Fri, Aug 11, 2017 at 05:53:26PM +1000, Stephen Rothwell wrote:
> > > 
> > > Today's linux-next merge of the akpm-current tree got conflicts in:
> > > 
> > >   include/linux/mm_types.h
> > >   mm/huge_memory.c
> > > 
> > > between commit:
> > > 
> > >   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> > > 
> > > from the tip tree and commits:
> > > 
> > >   16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
> > >   a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as 
> > > long as possible"")
> > > 
> > > from the akpm-current tree.
> > > 
> > > The latter 2 are now in Linus' tree as well (but were not when I started
> > > the day).
> >
> > Here's two patches that apply on top of tip.
> 
> What I will really need (on Monday) is a merge resolution between
> Linus' tree and the tip tree ...

I've done a minimal conflict resolution merge locally. Peter, could you please 
double check my resolution, in:

  040cca3ab2f6: Merge branch 'linus' into locking/core, to resolve conflicts

Thanks,

Ingo


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Stephen Rothwell
Hi Peter,

On Fri, 11 Aug 2017 11:34:49 +0200 Peter Zijlstra  wrote:
>
> On Fri, Aug 11, 2017 at 05:53:26PM +1000, Stephen Rothwell wrote:
> > 
> > Today's linux-next merge of the akpm-current tree got conflicts in:
> > 
> >   include/linux/mm_types.h
> >   mm/huge_memory.c
> > 
> > between commit:
> > 
> >   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> > 
> > from the tip tree and commits:
> > 
> >   16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
> >   a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as 
> > long as possible"")
> > 
> > from the akpm-current tree.
> > 
> > The latter 2 are now in Linus' tree as well (but were not when I started
> > the day).
>
> Here's two patches that apply on top of tip.

What I will really need (on Monday) is a merge resolution between
Linus' tree and the tip tree ...

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Stephen Rothwell
Hi Peter,

On Fri, 11 Aug 2017 11:34:49 +0200 Peter Zijlstra  wrote:
>
> On Fri, Aug 11, 2017 at 05:53:26PM +1000, Stephen Rothwell wrote:
> > 
> > Today's linux-next merge of the akpm-current tree got conflicts in:
> > 
> >   include/linux/mm_types.h
> >   mm/huge_memory.c
> > 
> > between commit:
> > 
> >   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> > 
> > from the tip tree and commits:
> > 
> >   16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
> >   a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as 
> > long as possible"")
> > 
> > from the akpm-current tree.
> > 
> > The latter 2 are now in Linus' tree as well (but were not when I started
> > the day).
>
> Here's two patches that apply on top of tip.

What I will really need (on Monday) is a merge resolution between
Linus' tree and the tip tree ...

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Peter Zijlstra
On Fri, Aug 11, 2017 at 11:34:49AM +0200, Peter Zijlstra wrote:
> On Fri, Aug 11, 2017 at 05:53:26PM +1000, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Today's linux-next merge of the akpm-current tree got conflicts in:
> > 
> >   include/linux/mm_types.h
> >   mm/huge_memory.c
> > 
> > between commit:
> > 
> >   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> > 
> > from the tip tree and commits:
> > 
> >   16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
> >   a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as 
> > long as possible"")
> > 
> > from the akpm-current tree.
> > 
> > The latter 2 are now in Linus' tree as well (but were not when I started
> > the day).
> > 
> > The only way forward I could see was to revert
> > 
> >   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> > 
> > and the three following commits
> > 
> >   ff7a5fb0f1d5 ("overlayfs, locking: Remove smp_mb__before_spinlock() 
> > usage")
> >   d89e588ca408 ("locking: Introduce smp_mb__after_spinlock()")
> >   a9668cd6ee28 ("locking: Remove smp_mb__before_spinlock()")
> > 
> > before merging the akpm-current tree again.
> 
> Here's two patches that apply on top of tip.
> 


And here's one to fix the PPC ordering issue I found while doing those
patches.


---
Subject: mm: Fix barrier for inc_tlb_flush_pending() for PPC
From: Peter Zijlstra 
Date: Fri Aug 11 12:43:33 CEST 2017

When we have SPLIT_PTE_PTLOCKS and have RCpc locks (PPC) the UNLOCK of
one does not in fact order against the LOCK of another lock. Therefore
the documented scheme does not work.

Add an explicit smp_mb__after_atomic() to cure things.

Also update the comment to reflect the new inc/dec thing.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/mm_types.h |   34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -533,7 +533,7 @@ static inline bool mm_tlb_flush_pending(
 {
/*
 * Must be called with PTL held; such that our PTL acquire will have
-* observed the store from set_tlb_flush_pending().
+* observed the increment from inc_tlb_flush_pending().
 */
return atomic_read(>tlb_flush_pending);
 }
@@ -547,13 +547,11 @@ static inline void inc_tlb_flush_pending
 {
atomic_inc(>tlb_flush_pending);
/*
-* The only time this value is relevant is when there are indeed pages
-* to flush. And we'll only flush pages after changing them, which
-* requires the PTL.
-*
 * So the ordering here is:
 *
-*  mm->tlb_flush_pending = true;
+*  atomic_inc(>tlb_flush_pending)
+*  smp_mb__after_atomic();
+*
 *  spin_lock();
 *  ...
 *  set_pte_at();
@@ -565,17 +563,33 @@ static inline void inc_tlb_flush_pending
 *  spin_unlock();
 *
 *  flush_tlb_range();
-*  mm->tlb_flush_pending = false;
+*  atomic_dec(>tlb_flush_pending);
 *
-* So the =true store is constrained by the PTL unlock, and the =false
-* store is constrained by the TLB invalidate.
+* Where we order the increment against the PTE modification with the
+* smp_mb__after_atomic(). It would appear that the spin_unlock()
+* is sufficient to constrain the inc, because we only care about the
+* value if there is indeed a pending PTE modification. However with
+* SPLIT_PTE_PTLOCKS and RCpc locks (PPC) the UNLOCK of one lock does
+* not order against the LOCK of another lock.
+*
+* The decrement is ordered by the flush_tlb_range(), such that
+* mm_tlb_flush_pending() will not return false unless all flushes have
+* completed.
 */
+   smp_mb__after_atomic();
 }
 
 /* Clearing is done after a TLB flush, which also provides a barrier. */
 static inline void dec_tlb_flush_pending(struct mm_struct *mm)
 {
-   /* see set_tlb_flush_pending */
+   /*
+* See inc_tlb_flush_pending().
+*
+* This cannot be smp_mb__before_atomic() because smp_mb() simply does
+* not order against TLB invalidate completion, which is what we need.
+*
+* Therefore we must rely on tlb_flush_*() to guarantee order.
+*/
atomic_dec(>tlb_flush_pending);
 }
 #else



Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Peter Zijlstra
On Fri, Aug 11, 2017 at 11:34:49AM +0200, Peter Zijlstra wrote:
> On Fri, Aug 11, 2017 at 05:53:26PM +1000, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Today's linux-next merge of the akpm-current tree got conflicts in:
> > 
> >   include/linux/mm_types.h
> >   mm/huge_memory.c
> > 
> > between commit:
> > 
> >   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> > 
> > from the tip tree and commits:
> > 
> >   16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
> >   a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as 
> > long as possible"")
> > 
> > from the akpm-current tree.
> > 
> > The latter 2 are now in Linus' tree as well (but were not when I started
> > the day).
> > 
> > The only way forward I could see was to revert
> > 
> >   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> > 
> > and the three following commits
> > 
> >   ff7a5fb0f1d5 ("overlayfs, locking: Remove smp_mb__before_spinlock() 
> > usage")
> >   d89e588ca408 ("locking: Introduce smp_mb__after_spinlock()")
> >   a9668cd6ee28 ("locking: Remove smp_mb__before_spinlock()")
> > 
> > before merging the akpm-current tree again.
> 
> Here's two patches that apply on top of tip.
> 


And here's one to fix the PPC ordering issue I found while doing those
patches.


---
Subject: mm: Fix barrier for inc_tlb_flush_pending() for PPC
From: Peter Zijlstra 
Date: Fri Aug 11 12:43:33 CEST 2017

When we have SPLIT_PTE_PTLOCKS and have RCpc locks (PPC) the UNLOCK of
one does not in fact order against the LOCK of another lock. Therefore
the documented scheme does not work.

Add an explicit smp_mb__after_atomic() to cure things.

Also update the comment to reflect the new inc/dec thing.

Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/mm_types.h |   34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -533,7 +533,7 @@ static inline bool mm_tlb_flush_pending(
 {
/*
 * Must be called with PTL held; such that our PTL acquire will have
-* observed the store from set_tlb_flush_pending().
+* observed the increment from inc_tlb_flush_pending().
 */
return atomic_read(>tlb_flush_pending);
 }
@@ -547,13 +547,11 @@ static inline void inc_tlb_flush_pending
 {
atomic_inc(>tlb_flush_pending);
/*
-* The only time this value is relevant is when there are indeed pages
-* to flush. And we'll only flush pages after changing them, which
-* requires the PTL.
-*
 * So the ordering here is:
 *
-*  mm->tlb_flush_pending = true;
+*  atomic_inc(>tlb_flush_pending)
+*  smp_mb__after_atomic();
+*
 *  spin_lock();
 *  ...
 *  set_pte_at();
@@ -565,17 +563,33 @@ static inline void inc_tlb_flush_pending
 *  spin_unlock();
 *
 *  flush_tlb_range();
-*  mm->tlb_flush_pending = false;
+*  atomic_dec(>tlb_flush_pending);
 *
-* So the =true store is constrained by the PTL unlock, and the =false
-* store is constrained by the TLB invalidate.
+* Where we order the increment against the PTE modification with the
+* smp_mb__after_atomic(). It would appear that the spin_unlock()
+* is sufficient to constrain the inc, because we only care about the
+* value if there is indeed a pending PTE modification. However with
+* SPLIT_PTE_PTLOCKS and RCpc locks (PPC) the UNLOCK of one lock does
+* not order against the LOCK of another lock.
+*
+* The decrement is ordered by the flush_tlb_range(), such that
+* mm_tlb_flush_pending() will not return false unless all flushes have
+* completed.
 */
+   smp_mb__after_atomic();
 }
 
 /* Clearing is done after a TLB flush, which also provides a barrier. */
 static inline void dec_tlb_flush_pending(struct mm_struct *mm)
 {
-   /* see set_tlb_flush_pending */
+   /*
+* See inc_tlb_flush_pending().
+*
+* This cannot be smp_mb__before_atomic() because smp_mb() simply does
+* not order against TLB invalidate completion, which is what we need.
+*
+* Therefore we must rely on tlb_flush_*() to guarantee order.
+*/
atomic_dec(>tlb_flush_pending);
 }
 #else



Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Peter Zijlstra
On Fri, Aug 11, 2017 at 05:53:26PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the akpm-current tree got conflicts in:
> 
>   include/linux/mm_types.h
>   mm/huge_memory.c
> 
> between commit:
> 
>   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> 
> from the tip tree and commits:
> 
>   16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
>   a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as long 
> as possible"")
> 
> from the akpm-current tree.
> 
> The latter 2 are now in Linus' tree as well (but were not when I started
> the day).
> 
> The only way forward I could see was to revert
> 
>   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> 
> and the three following commits
> 
>   ff7a5fb0f1d5 ("overlayfs, locking: Remove smp_mb__before_spinlock() usage")
>   d89e588ca408 ("locking: Introduce smp_mb__after_spinlock()")
>   a9668cd6ee28 ("locking: Remove smp_mb__before_spinlock()")
> 
> before merging the akpm-current tree again.

Here's two patches that apply on top of tip.

Subject: mm: migrate: prevent racy access to tlb_flush_pending
From: Nadav Amit 
Date: Tue, 1 Aug 2017 17:08:12 -0700

Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work(). If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.

This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending. The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.

An actual data corruption was not observed, yet the race was
was confirmed by adding assertion to check tlb_flush_pending is not set
by two threads, adding artificial latency in change_protection_range()
and using sysctl to reduce kernel.numa_balancing_scan_delay_ms.

Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")


Cc: 
Cc: CC: 
Cc: Andy Lutomirski 
Signed-off-by: Nadav Amit 
Acked-by: Mel Gorman 
Acked-by: Rik van Riel 
Acked-by: Minchan Kim 
Signed-off-by: Peter Zijlstra (Intel) 
Link: http://lkml.kernel.org/r/20170802000818.4760-2-na...@vmware.com
---
 include/linux/mm_types.h |   29 +
 kernel/fork.c|2 +-
 mm/debug.c   |2 +-
 mm/mprotect.c|4 ++--
 4 files changed, 25 insertions(+), 12 deletions(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -493,7 +493,7 @@ struct mm_struct {
 	 * can move process memory needs to flush the TLB when moving a
 	 * PROT_NONE or PROT_NUMA mapped page.
 	 */
-	bool tlb_flush_pending;
+	atomic_t tlb_flush_pending;
 #endif
 #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
 	/* See flush_tlb_batched_pending() */
@@ -535,11 +535,17 @@ static inline bool mm_tlb_flush_pending(
 	 * Must be called with PTL held; such that our PTL acquire will have
 	 * observed the store from set_tlb_flush_pending().
 	 */
-	return mm->tlb_flush_pending;
+	return atomic_read(>tlb_flush_pending);
 }
-static inline void set_tlb_flush_pending(struct mm_struct *mm)
+
+static inline void init_tlb_flush_pending(struct mm_struct *mm)
 {
-	mm->tlb_flush_pending = true;
+	atomic_set(>tlb_flush_pending, 0);
+}
+
+static inline void inc_tlb_flush_pending(struct mm_struct *mm)
+{
+	atomic_inc(>tlb_flush_pending);
 	/*
 	 * The only time this value is relevant is when there are indeed pages
 	 * to flush. And we'll only flush pages after changing them, which
@@ -565,21 +571,28 @@ static inline void set_tlb_flush_pending
 	 * store is constrained by the TLB invalidate.
 	 */
 }
+
 /* Clearing is done after a TLB flush, which also provides a barrier. */
-static inline void clear_tlb_flush_pending(struct mm_struct *mm)
+static inline void dec_tlb_flush_pending(struct mm_struct *mm)
 {
 	/* see set_tlb_flush_pending */
-	mm->tlb_flush_pending = false;
+	atomic_dec(>tlb_flush_pending);
 }
 #else
 static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
 {
 	return false;
 }
-static inline void set_tlb_flush_pending(struct mm_struct *mm)
+
+static inline void init_tlb_flush_pending(struct mm_struct *mm)
 {
 }
-static inline void clear_tlb_flush_pending(struct mm_struct *mm)
+
+static inline void inc_tlb_flush_pending(struct mm_struct *mm)
+{
+}
+
+static inline void dec_tlb_flush_pending(struct mm_struct *mm)
 {
 }
 #endif
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -809,7 +809,7 @@ static struct mm_struct *mm_init(struct
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
 	mmu_notifier_mm_init(mm);
-	clear_tlb_flush_pending(mm);
+	init_tlb_flush_pending(mm);
 #if 

Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Peter Zijlstra
On Fri, Aug 11, 2017 at 05:53:26PM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the akpm-current tree got conflicts in:
> 
>   include/linux/mm_types.h
>   mm/huge_memory.c
> 
> between commit:
> 
>   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> 
> from the tip tree and commits:
> 
>   16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
>   a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as long 
> as possible"")
> 
> from the akpm-current tree.
> 
> The latter 2 are now in Linus' tree as well (but were not when I started
> the day).
> 
> The only way forward I could see was to revert
> 
>   8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")
> 
> and the three following commits
> 
>   ff7a5fb0f1d5 ("overlayfs, locking: Remove smp_mb__before_spinlock() usage")
>   d89e588ca408 ("locking: Introduce smp_mb__after_spinlock()")
>   a9668cd6ee28 ("locking: Remove smp_mb__before_spinlock()")
> 
> before merging the akpm-current tree again.

Here's two patches that apply on top of tip.

Subject: mm: migrate: prevent racy access to tlb_flush_pending
From: Nadav Amit 
Date: Tue, 1 Aug 2017 17:08:12 -0700

Setting and clearing mm->tlb_flush_pending can be performed by multiple
threads, since mmap_sem may only be acquired for read in
task_numa_work(). If this happens, tlb_flush_pending might be cleared
while one of the threads still changes PTEs and batches TLB flushes.

This can lead to the same race between migration and
change_protection_range() that led to the introduction of
tlb_flush_pending. The result of this race was data corruption, which
means that this patch also addresses a theoretically possible data
corruption.

An actual data corruption was not observed, yet the race was
was confirmed by adding assertion to check tlb_flush_pending is not set
by two threads, adding artificial latency in change_protection_range()
and using sysctl to reduce kernel.numa_balancing_scan_delay_ms.

Fixes: 20841405940e ("mm: fix TLB flush race between migration, and
change_protection_range")


Cc: 
Cc: CC: 
Cc: Andy Lutomirski 
Signed-off-by: Nadav Amit 
Acked-by: Mel Gorman 
Acked-by: Rik van Riel 
Acked-by: Minchan Kim 
Signed-off-by: Peter Zijlstra (Intel) 
Link: http://lkml.kernel.org/r/20170802000818.4760-2-na...@vmware.com
---
 include/linux/mm_types.h |   29 +
 kernel/fork.c|2 +-
 mm/debug.c   |2 +-
 mm/mprotect.c|4 ++--
 4 files changed, 25 insertions(+), 12 deletions(-)

--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -493,7 +493,7 @@ struct mm_struct {
 	 * can move process memory needs to flush the TLB when moving a
 	 * PROT_NONE or PROT_NUMA mapped page.
 	 */
-	bool tlb_flush_pending;
+	atomic_t tlb_flush_pending;
 #endif
 #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
 	/* See flush_tlb_batched_pending() */
@@ -535,11 +535,17 @@ static inline bool mm_tlb_flush_pending(
 	 * Must be called with PTL held; such that our PTL acquire will have
 	 * observed the store from set_tlb_flush_pending().
 	 */
-	return mm->tlb_flush_pending;
+	return atomic_read(>tlb_flush_pending);
 }
-static inline void set_tlb_flush_pending(struct mm_struct *mm)
+
+static inline void init_tlb_flush_pending(struct mm_struct *mm)
 {
-	mm->tlb_flush_pending = true;
+	atomic_set(>tlb_flush_pending, 0);
+}
+
+static inline void inc_tlb_flush_pending(struct mm_struct *mm)
+{
+	atomic_inc(>tlb_flush_pending);
 	/*
 	 * The only time this value is relevant is when there are indeed pages
 	 * to flush. And we'll only flush pages after changing them, which
@@ -565,21 +571,28 @@ static inline void set_tlb_flush_pending
 	 * store is constrained by the TLB invalidate.
 	 */
 }
+
 /* Clearing is done after a TLB flush, which also provides a barrier. */
-static inline void clear_tlb_flush_pending(struct mm_struct *mm)
+static inline void dec_tlb_flush_pending(struct mm_struct *mm)
 {
 	/* see set_tlb_flush_pending */
-	mm->tlb_flush_pending = false;
+	atomic_dec(>tlb_flush_pending);
 }
 #else
 static inline bool mm_tlb_flush_pending(struct mm_struct *mm)
 {
 	return false;
 }
-static inline void set_tlb_flush_pending(struct mm_struct *mm)
+
+static inline void init_tlb_flush_pending(struct mm_struct *mm)
 {
 }
-static inline void clear_tlb_flush_pending(struct mm_struct *mm)
+
+static inline void inc_tlb_flush_pending(struct mm_struct *mm)
+{
+}
+
+static inline void dec_tlb_flush_pending(struct mm_struct *mm)
 {
 }
 #endif
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -809,7 +809,7 @@ static struct mm_struct *mm_init(struct
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
 	mmu_notifier_mm_init(mm);
-	clear_tlb_flush_pending(mm);
+	init_tlb_flush_pending(mm);
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
 	mm->pmd_huge_pte = NULL;
 #endif
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -159,7 +159,7 @@ void dump_mm(const struct mm_struct 

linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got conflicts in:

  include/linux/mm_types.h
  mm/huge_memory.c

between commit:

  8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")

from the tip tree and commits:

  16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
  a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as long as 
possible"")

from the akpm-current tree.

The latter 2 are now in Linus' tree as well (but were not when I started
the day).

The only way forward I could see was to revert

  8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")

and the three following commits

  ff7a5fb0f1d5 ("overlayfs, locking: Remove smp_mb__before_spinlock() usage")
  d89e588ca408 ("locking: Introduce smp_mb__after_spinlock()")
  a9668cd6ee28 ("locking: Remove smp_mb__before_spinlock()")

before merging the akpm-current tree again.

-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the akpm-current tree with the tip tree

2017-08-11 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got conflicts in:

  include/linux/mm_types.h
  mm/huge_memory.c

between commit:

  8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")

from the tip tree and commits:

  16af97dc5a89 ("mm: migrate: prevent racy access to tlb_flush_pending")
  a9b802500ebb ("Revert "mm: numa: defer TLB flush for THP migration as long as 
possible"")

from the akpm-current tree.

The latter 2 are now in Linus' tree as well (but were not when I started
the day).

The only way forward I could see was to revert

  8b1b436dd1cc ("mm, locking: Rework {set,clear,mm}_tlb_flush_pending()")

and the three following commits

  ff7a5fb0f1d5 ("overlayfs, locking: Remove smp_mb__before_spinlock() usage")
  d89e588ca408 ("locking: Introduce smp_mb__after_spinlock()")
  a9668cd6ee28 ("locking: Remove smp_mb__before_spinlock()")

before merging the akpm-current tree again.

-- 
Cheers,
Stephen Rothwell


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-04-19 Thread NeilBrown
On Wed, Apr 12 2017, Vlastimil Babka wrote:

> On 12.4.2017 8:46, Stephen Rothwell wrote:
>> Hi Andrew,
>> 
>> Today's linux-next merge of the akpm-current tree got conflicts in:
>> 
>>   drivers/block/nbd.c
>>   drivers/scsi/iscsi_tcp.c
>>   net/core/dev.c
>>   net/core/sock.c
>> 
>> between commit:
>> 
>>   717a94b5fc70 ("sched/core: Remove 'task' parameter and rename 
>> tsk_restore_flags() to current_restore_flags()")
>> 
>> from the tip tree and commit:
>> 
>>   61d5ad5b2e8a ("treewide: convert PF_MEMALLOC manipulations to new helpers")
>> 
>> from the akpm-current tree.
>
> Yeah, the first patch from Neil renames a function (as its subject says) and 
> the
> second patch from me converts most of its users to new helpers specific to the
> PF_MEMALLOC flags.
>
>> I fixed it up (the latter is just a superset of the former, so I used
>
> It's not a complete superset though, more on that below.
>
>> that) and can carry the fix as necessary. This is now fixed as far as
>> linux-next is concerned, but any non trivial conflicts should be mentioned
>> to your upstream maintainer when your tree is submitted for merging.
>> You may also want to consider cooperating with the maintainer of the
>> conflicting tree to minimise any particularly complex conflicts.
>
> Hmm I could redo my patch on top of Neil's patch, but then Andrew would have 
> to
> carry Neil's patch as well just to have a working mmotm? And then make sure to
> send my patch (but not Neil's) only after the tip tree is pulled? Would that
> work for the maintainers involved?
>
>> It looks like there may be more instances that the latter patch should
>> update.
>
> I see two remaining instances of current_restore_flags(). One in 
> __do_softirq()
> is even for PF_MEMALLOC, but there the flag is cleared first and then set 
> back,
> which is opposite of the common case that my helpers provide. The other in 
> nfsd
> is for PF_LESS_THROTTLE which is not common enough to earn own helpers yet. 
> IIRC
> Neil originally wanted to add a new one?

[Sorry - I thought I had sent this last week, but just noticed that I didn't]

In general, I'm not a fan of overly-specific helpers.

As a general rule, tsk_restore_flags() is probably better than
current_restore_flags() as it is more general.
However in this specific case, using any task other than 'current' would
almost certainly be incorrect code as locking is impossible.  So I
prefer the 'current' to be implicit, but the actual flag to be explicit.

If you are going to add helpers for setting/clearing PF flags, I would
much rather that you take

#define current_test_flags(f)   (current->flags & (f))
#define current_set_flags_nested(sp, f) \
(*(sp) = current->flags, current->flags |= (f))
#define current_clear_flags_nested(sp, f)   \
(*(sp) = current->flags, current->flags &= ~(f))
#define current_restore_flags_nested(sp, f) \
(current->flags = ((current->flags & ~(f)) | (*(sp) & (f

out of fs/xfs/xfs_linux.h and use them globally.

Your
  noreclaim_flag = memalloc_reclaim_save()
becomes
  current_set_flags_nested_flag, PF_MEMALLOC)
which is more typing, but arguably easier to read.

If you then changed all uses of tsk_restore_flags() to use
current_restore_flags_nested(), my patch could be discarded as
irrelevant.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-04-19 Thread NeilBrown
On Wed, Apr 12 2017, Vlastimil Babka wrote:

> On 12.4.2017 8:46, Stephen Rothwell wrote:
>> Hi Andrew,
>> 
>> Today's linux-next merge of the akpm-current tree got conflicts in:
>> 
>>   drivers/block/nbd.c
>>   drivers/scsi/iscsi_tcp.c
>>   net/core/dev.c
>>   net/core/sock.c
>> 
>> between commit:
>> 
>>   717a94b5fc70 ("sched/core: Remove 'task' parameter and rename 
>> tsk_restore_flags() to current_restore_flags()")
>> 
>> from the tip tree and commit:
>> 
>>   61d5ad5b2e8a ("treewide: convert PF_MEMALLOC manipulations to new helpers")
>> 
>> from the akpm-current tree.
>
> Yeah, the first patch from Neil renames a function (as its subject says) and 
> the
> second patch from me converts most of its users to new helpers specific to the
> PF_MEMALLOC flags.
>
>> I fixed it up (the latter is just a superset of the former, so I used
>
> It's not a complete superset though, more on that below.
>
>> that) and can carry the fix as necessary. This is now fixed as far as
>> linux-next is concerned, but any non trivial conflicts should be mentioned
>> to your upstream maintainer when your tree is submitted for merging.
>> You may also want to consider cooperating with the maintainer of the
>> conflicting tree to minimise any particularly complex conflicts.
>
> Hmm I could redo my patch on top of Neil's patch, but then Andrew would have 
> to
> carry Neil's patch as well just to have a working mmotm? And then make sure to
> send my patch (but not Neil's) only after the tip tree is pulled? Would that
> work for the maintainers involved?
>
>> It looks like there may be more instances that the latter patch should
>> update.
>
> I see two remaining instances of current_restore_flags(). One in 
> __do_softirq()
> is even for PF_MEMALLOC, but there the flag is cleared first and then set 
> back,
> which is opposite of the common case that my helpers provide. The other in 
> nfsd
> is for PF_LESS_THROTTLE which is not common enough to earn own helpers yet. 
> IIRC
> Neil originally wanted to add a new one?

[Sorry - I thought I had sent this last week, but just noticed that I didn't]

In general, I'm not a fan of overly-specific helpers.

As a general rule, tsk_restore_flags() is probably better than
current_restore_flags() as it is more general.
However in this specific case, using any task other than 'current' would
almost certainly be incorrect code as locking is impossible.  So I
prefer the 'current' to be implicit, but the actual flag to be explicit.

If you are going to add helpers for setting/clearing PF flags, I would
much rather that you take

#define current_test_flags(f)   (current->flags & (f))
#define current_set_flags_nested(sp, f) \
(*(sp) = current->flags, current->flags |= (f))
#define current_clear_flags_nested(sp, f)   \
(*(sp) = current->flags, current->flags &= ~(f))
#define current_restore_flags_nested(sp, f) \
(current->flags = ((current->flags & ~(f)) | (*(sp) & (f

out of fs/xfs/xfs_linux.h and use them globally.

Your
  noreclaim_flag = memalloc_reclaim_save()
becomes
  current_set_flags_nested_flag, PF_MEMALLOC)
which is more typing, but arguably easier to read.

If you then changed all uses of tsk_restore_flags() to use
current_restore_flags_nested(), my patch could be discarded as
irrelevant.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-04-12 Thread Vlastimil Babka
On 12.4.2017 8:46, Stephen Rothwell wrote:
> Hi Andrew,
> 
> Today's linux-next merge of the akpm-current tree got conflicts in:
> 
>   drivers/block/nbd.c
>   drivers/scsi/iscsi_tcp.c
>   net/core/dev.c
>   net/core/sock.c
> 
> between commit:
> 
>   717a94b5fc70 ("sched/core: Remove 'task' parameter and rename 
> tsk_restore_flags() to current_restore_flags()")
> 
> from the tip tree and commit:
> 
>   61d5ad5b2e8a ("treewide: convert PF_MEMALLOC manipulations to new helpers")
> 
> from the akpm-current tree.

Yeah, the first patch from Neil renames a function (as its subject says) and the
second patch from me converts most of its users to new helpers specific to the
PF_MEMALLOC flags.

> I fixed it up (the latter is just a superset of the former, so I used

It's not a complete superset though, more on that below.

> that) and can carry the fix as necessary. This is now fixed as far as
> linux-next is concerned, but any non trivial conflicts should be mentioned
> to your upstream maintainer when your tree is submitted for merging.
> You may also want to consider cooperating with the maintainer of the
> conflicting tree to minimise any particularly complex conflicts.

Hmm I could redo my patch on top of Neil's patch, but then Andrew would have to
carry Neil's patch as well just to have a working mmotm? And then make sure to
send my patch (but not Neil's) only after the tip tree is pulled? Would that
work for the maintainers involved?

> It looks like there may be more instances that the latter patch should
> update.

I see two remaining instances of current_restore_flags(). One in __do_softirq()
is even for PF_MEMALLOC, but there the flag is cleared first and then set back,
which is opposite of the common case that my helpers provide. The other in nfsd
is for PF_LESS_THROTTLE which is not common enough to earn own helpers yet. IIRC
Neil originally wanted to add a new one?



Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-04-12 Thread Vlastimil Babka
On 12.4.2017 8:46, Stephen Rothwell wrote:
> Hi Andrew,
> 
> Today's linux-next merge of the akpm-current tree got conflicts in:
> 
>   drivers/block/nbd.c
>   drivers/scsi/iscsi_tcp.c
>   net/core/dev.c
>   net/core/sock.c
> 
> between commit:
> 
>   717a94b5fc70 ("sched/core: Remove 'task' parameter and rename 
> tsk_restore_flags() to current_restore_flags()")
> 
> from the tip tree and commit:
> 
>   61d5ad5b2e8a ("treewide: convert PF_MEMALLOC manipulations to new helpers")
> 
> from the akpm-current tree.

Yeah, the first patch from Neil renames a function (as its subject says) and the
second patch from me converts most of its users to new helpers specific to the
PF_MEMALLOC flags.

> I fixed it up (the latter is just a superset of the former, so I used

It's not a complete superset though, more on that below.

> that) and can carry the fix as necessary. This is now fixed as far as
> linux-next is concerned, but any non trivial conflicts should be mentioned
> to your upstream maintainer when your tree is submitted for merging.
> You may also want to consider cooperating with the maintainer of the
> conflicting tree to minimise any particularly complex conflicts.

Hmm I could redo my patch on top of Neil's patch, but then Andrew would have to
carry Neil's patch as well just to have a working mmotm? And then make sure to
send my patch (but not Neil's) only after the tip tree is pulled? Would that
work for the maintainers involved?

> It looks like there may be more instances that the latter patch should
> update.

I see two remaining instances of current_restore_flags(). One in __do_softirq()
is even for PF_MEMALLOC, but there the flag is cleared first and then set back,
which is opposite of the common case that my helpers provide. The other in nfsd
is for PF_LESS_THROTTLE which is not common enough to earn own helpers yet. IIRC
Neil originally wanted to add a new one?



linux-next: manual merge of the akpm-current tree with the tip tree

2017-04-12 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got conflicts in:

  drivers/block/nbd.c
  drivers/scsi/iscsi_tcp.c
  net/core/dev.c
  net/core/sock.c

between commit:

  717a94b5fc70 ("sched/core: Remove 'task' parameter and rename 
tsk_restore_flags() to current_restore_flags()")

from the tip tree and commit:

  61d5ad5b2e8a ("treewide: convert PF_MEMALLOC manipulations to new helpers")

from the akpm-current tree.

I fixed it up (the latter is just a superset of the former, so I used
that) and can carry the fix as necessary. This is now fixed as far as
linux-next is concerned, but any non trivial conflicts should be mentioned
to your upstream maintainer when your tree is submitted for merging.
You may also want to consider cooperating with the maintainer of the
conflicting tree to minimise any particularly complex conflicts.

It looks like there may be more instances that the latter patch should
update.
-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the akpm-current tree with the tip tree

2017-04-12 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got conflicts in:

  drivers/block/nbd.c
  drivers/scsi/iscsi_tcp.c
  net/core/dev.c
  net/core/sock.c

between commit:

  717a94b5fc70 ("sched/core: Remove 'task' parameter and rename 
tsk_restore_flags() to current_restore_flags()")

from the tip tree and commit:

  61d5ad5b2e8a ("treewide: convert PF_MEMALLOC manipulations to new helpers")

from the akpm-current tree.

I fixed it up (the latter is just a superset of the former, so I used
that) and can carry the fix as necessary. This is now fixed as far as
linux-next is concerned, but any non trivial conflicts should be mentioned
to your upstream maintainer when your tree is submitted for merging.
You may also want to consider cooperating with the maintainer of the
conflicting tree to minimise any particularly complex conflicts.

It looks like there may be more instances that the latter patch should
update.
-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the akpm-current tree with the tip tree

2017-03-23 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got conflicts in:

  arch/x86/include/asm/atomic.h
  arch/x86/include/asm/atomic64_64.h

between commits:

  a9ebf306f52c ("locking/atomic: Introduce atomic_try_cmpxchg()")
  e6790e4b5d5e ("locking/atomic/x86: Use atomic_try_cmpxchg()")

from the tip tree and commit:

  3f4ca3d25e1a ("asm-generic, x86: wrap atomic operations")

from the akpm-current tree.

I fixed it up (see below - though more work is probably needed) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

The below resolution is not quite right so I added this on top:

From: Stephen Rothwell 
Date: Fri, 24 Mar 2017 16:14:42 +1100
Subject: [PATCH] fix for bad merge fix

Signed-off-by: Stephen Rothwell 
---
 arch/x86/include/asm/atomic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index fc4412567a4a..f717b73182e7 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -217,7 +217,7 @@ static inline void arch_atomic_##op(int i, atomic_t *v) 
\
 }
 
 #define ATOMIC_FETCH_OP(op, c_op)  \
-static inline int atomic_fetch_##op(int i, atomic_t *v)
\
+static inline int arch_atomic_fetch_##op(int i, atomic_t *v)   \
 {  \
int val = arch_atomic_read(v);  \
do {\
-- 
2.11.0

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/include/asm/atomic.h
index caa5798c92f4,95dd167eb3af..
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@@ -181,20 -191,14 +191,20 @@@ static __always_inline int arch_atomic_
return xadd(>counter, -i);
  }
  
- static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+ static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
  {
-   return cmpxchg(>counter, old, new);
+   return arch_cmpxchg(>counter, old, new);
  }
  
 +#define atomic_try_cmpxchg atomic_try_cmpxchg
 +static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 +{
 +  return try_cmpxchg(>counter, old, new);
 +}
 +
- static inline int atomic_xchg(atomic_t *v, int new)
+ static inline int arch_atomic_xchg(atomic_t *v, int new)
  {
-   return xchg(>counter, new);
+   return arch_xchg(>counter, new);
  }
  
  #define ATOMIC_OP(op) \
@@@ -207,12 -211,16 +217,12 @@@ static inline void arch_atomic_##op(in
  }
  
  #define ATOMIC_FETCH_OP(op, c_op) \
 -static inline int arch_atomic_fetch_##op(int i, atomic_t *v)  \
 +static inline int atomic_fetch_##op(int i, atomic_t *v)   
\
  { \
-   int val = atomic_read(v);   \
 -  int old, val = arch_atomic_read(v); \
 -  for (;;) {  \
 -  old = arch_atomic_cmpxchg(v, val, val c_op i);  \
 -  if (old == val) \
 -  break;  \
 -  val = old;  \
 -  }   \
 -  return old; \
++  int val = arch_atomic_read(v);  \
 +  do {\
 +  } while (!atomic_try_cmpxchg(v, , val c_op i)); \
 +  return val; \
  }
  
  #define ATOMIC_OPS(op, c_op)  \
@@@ -236,13 -244,18 +246,13 @@@ ATOMIC_OPS(xor, ^
   * Atomically adds @a to @v, so long as @v was not already @u.
   * Returns the old value of @v.
   */
- static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
+ static __always_inline int __arch_atomic_add_unless(atomic_t *v, int a, int u)
  {
-   int c = atomic_read(v);
 -  int c, old;
 -  c = arch_atomic_read(v);
 -  for (;;) {
 -  if (unlikely(c == (u)))
 -  break;
 -  old = arch_atomic_cmpxchg((v), c, c + (a));
 -  if (likely(old == c))
++  int c = arch_atomic_read(v);
 +  

linux-next: manual merge of the akpm-current tree with the tip tree

2017-03-23 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got conflicts in:

  arch/x86/include/asm/atomic.h
  arch/x86/include/asm/atomic64_64.h

between commits:

  a9ebf306f52c ("locking/atomic: Introduce atomic_try_cmpxchg()")
  e6790e4b5d5e ("locking/atomic/x86: Use atomic_try_cmpxchg()")

from the tip tree and commit:

  3f4ca3d25e1a ("asm-generic, x86: wrap atomic operations")

from the akpm-current tree.

I fixed it up (see below - though more work is probably needed) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

The below resolution is not quite right so I added this on top:

From: Stephen Rothwell 
Date: Fri, 24 Mar 2017 16:14:42 +1100
Subject: [PATCH] fix for bad merge fix

Signed-off-by: Stephen Rothwell 
---
 arch/x86/include/asm/atomic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index fc4412567a4a..f717b73182e7 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -217,7 +217,7 @@ static inline void arch_atomic_##op(int i, atomic_t *v) 
\
 }
 
 #define ATOMIC_FETCH_OP(op, c_op)  \
-static inline int atomic_fetch_##op(int i, atomic_t *v)
\
+static inline int arch_atomic_fetch_##op(int i, atomic_t *v)   \
 {  \
int val = arch_atomic_read(v);  \
do {\
-- 
2.11.0

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/include/asm/atomic.h
index caa5798c92f4,95dd167eb3af..
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@@ -181,20 -191,14 +191,20 @@@ static __always_inline int arch_atomic_
return xadd(>counter, -i);
  }
  
- static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+ static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
  {
-   return cmpxchg(>counter, old, new);
+   return arch_cmpxchg(>counter, old, new);
  }
  
 +#define atomic_try_cmpxchg atomic_try_cmpxchg
 +static __always_inline bool atomic_try_cmpxchg(atomic_t *v, int *old, int new)
 +{
 +  return try_cmpxchg(>counter, old, new);
 +}
 +
- static inline int atomic_xchg(atomic_t *v, int new)
+ static inline int arch_atomic_xchg(atomic_t *v, int new)
  {
-   return xchg(>counter, new);
+   return arch_xchg(>counter, new);
  }
  
  #define ATOMIC_OP(op) \
@@@ -207,12 -211,16 +217,12 @@@ static inline void arch_atomic_##op(in
  }
  
  #define ATOMIC_FETCH_OP(op, c_op) \
 -static inline int arch_atomic_fetch_##op(int i, atomic_t *v)  \
 +static inline int atomic_fetch_##op(int i, atomic_t *v)   
\
  { \
-   int val = atomic_read(v);   \
 -  int old, val = arch_atomic_read(v); \
 -  for (;;) {  \
 -  old = arch_atomic_cmpxchg(v, val, val c_op i);  \
 -  if (old == val) \
 -  break;  \
 -  val = old;  \
 -  }   \
 -  return old; \
++  int val = arch_atomic_read(v);  \
 +  do {\
 +  } while (!atomic_try_cmpxchg(v, , val c_op i)); \
 +  return val; \
  }
  
  #define ATOMIC_OPS(op, c_op)  \
@@@ -236,13 -244,18 +246,13 @@@ ATOMIC_OPS(xor, ^
   * Atomically adds @a to @v, so long as @v was not already @u.
   * Returns the old value of @v.
   */
- static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
+ static __always_inline int __arch_atomic_add_unless(atomic_t *v, int a, int u)
  {
-   int c = atomic_read(v);
 -  int c, old;
 -  c = arch_atomic_read(v);
 -  for (;;) {
 -  if (unlikely(c == (u)))
 -  break;
 -  old = arch_atomic_cmpxchg((v), c, c + (a));
 -  if (likely(old == c))
++  int c = arch_atomic_read(v);
 +  do {
 +  if (unlikely(c == u))

linux-next: manual merge of the akpm-current tree with the tip tree

2017-02-16 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got conflicts in:

  arch/cris/include/asm/Kbuild
  arch/m32r/include/asm/Kbuild
  arch/parisc/include/asm/Kbuild
  arch/score/include/asm/Kbuild

between commit:

  b672592f0221 ("sched/cputime: Remove generic asm headers")

from the tip tree and commits:

  ccbd1433 ("cris: use generic current.h")
  103c58f13b54 ("m32r: use generic current.h")
  35a25dde31aa ("score: remove asm/current.h")
  c6b552bc22c7 ("parisc: use generic current.h")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/cris/include/asm/Kbuild
index 9f19e19bff9d,5e320f660c3c..
--- a/arch/cris/include/asm/Kbuild
+++ b/arch/cris/include/asm/Kbuild
@@@ -4,6 -4,8 +4,7 @@@ generic-y += barrier.
  generic-y += bitsperlong.h
  generic-y += clkdev.h
  generic-y += cmpxchg.h
 -generic-y += cputime.h
+ generic-y += current.h
  generic-y += device.h
  generic-y += div64.h
  generic-y += errno.h
diff --cc arch/m32r/include/asm/Kbuild
index 652100b64a71,30ee92ff0244..
--- a/arch/m32r/include/asm/Kbuild
+++ b/arch/m32r/include/asm/Kbuild
@@@ -1,5 -1,7 +1,6 @@@
  
  generic-y += clkdev.h
 -generic-y += cputime.h
+ generic-y += current.h
  generic-y += exec.h
  generic-y += irq_work.h
  generic-y += kvm_para.h
diff --cc arch/parisc/include/asm/Kbuild
index 4e179d770d69,7ac070267672..
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@@ -2,6 -2,8 +2,7 @@@
  generic-y += auxvec.h
  generic-y += barrier.h
  generic-y += clkdev.h
 -generic-y += cputime.h
+ generic-y += current.h
  generic-y += device.h
  generic-y += div64.h
  generic-y += emergency-restart.h
diff --cc arch/score/include/asm/Kbuild
index 51970bb6c4fe,620970f837bc..
--- a/arch/score/include/asm/Kbuild
+++ b/arch/score/include/asm/Kbuild
@@@ -4,6 -4,8 +4,7 @@@ header-y +
  
  generic-y += barrier.h
  generic-y += clkdev.h
 -generic-y += cputime.h
+ generic-y += current.h
  generic-y += irq_work.h
  generic-y += mcs_spinlock.h
  generic-y += mm-arch-hooks.h


linux-next: manual merge of the akpm-current tree with the tip tree

2017-02-16 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got conflicts in:

  arch/cris/include/asm/Kbuild
  arch/m32r/include/asm/Kbuild
  arch/parisc/include/asm/Kbuild
  arch/score/include/asm/Kbuild

between commit:

  b672592f0221 ("sched/cputime: Remove generic asm headers")

from the tip tree and commits:

  ccbd1433 ("cris: use generic current.h")
  103c58f13b54 ("m32r: use generic current.h")
  35a25dde31aa ("score: remove asm/current.h")
  c6b552bc22c7 ("parisc: use generic current.h")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/cris/include/asm/Kbuild
index 9f19e19bff9d,5e320f660c3c..
--- a/arch/cris/include/asm/Kbuild
+++ b/arch/cris/include/asm/Kbuild
@@@ -4,6 -4,8 +4,7 @@@ generic-y += barrier.
  generic-y += bitsperlong.h
  generic-y += clkdev.h
  generic-y += cmpxchg.h
 -generic-y += cputime.h
+ generic-y += current.h
  generic-y += device.h
  generic-y += div64.h
  generic-y += errno.h
diff --cc arch/m32r/include/asm/Kbuild
index 652100b64a71,30ee92ff0244..
--- a/arch/m32r/include/asm/Kbuild
+++ b/arch/m32r/include/asm/Kbuild
@@@ -1,5 -1,7 +1,6 @@@
  
  generic-y += clkdev.h
 -generic-y += cputime.h
+ generic-y += current.h
  generic-y += exec.h
  generic-y += irq_work.h
  generic-y += kvm_para.h
diff --cc arch/parisc/include/asm/Kbuild
index 4e179d770d69,7ac070267672..
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@@ -2,6 -2,8 +2,7 @@@
  generic-y += auxvec.h
  generic-y += barrier.h
  generic-y += clkdev.h
 -generic-y += cputime.h
+ generic-y += current.h
  generic-y += device.h
  generic-y += div64.h
  generic-y += emergency-restart.h
diff --cc arch/score/include/asm/Kbuild
index 51970bb6c4fe,620970f837bc..
--- a/arch/score/include/asm/Kbuild
+++ b/arch/score/include/asm/Kbuild
@@@ -4,6 -4,8 +4,7 @@@ header-y +
  
  generic-y += barrier.h
  generic-y += clkdev.h
 -generic-y += cputime.h
+ generic-y += current.h
  generic-y += irq_work.h
  generic-y += mcs_spinlock.h
  generic-y += mm-arch-hooks.h


linux-next: manual merge of the akpm-current tree with the tip tree

2016-11-13 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  mm/memcontrol.c

between commit:

  308167fcb330 ("mm/memcg: Convert to hotplug state machine")

from the tip tree and commit:

  2558c318449d ("mm: memcontrol: use special workqueue for creating per-memcg 
caches")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc mm/memcontrol.c
index 6c2043509fb5,91dfc7c5ce8f..
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@@ -5774,8 -5785,18 +5776,19 @@@ static int __init mem_cgroup_init(void
  {
int cpu, node;
  
+ #ifndef CONFIG_SLOB
+   /*
+* Kmem cache creation is mostly done with the slab_mutex held,
+* so use a special workqueue to avoid stalling all worker
+* threads in case lots of cgroups are created simultaneously.
+*/
+   memcg_kmem_cache_create_wq =
+   alloc_ordered_workqueue("memcg_kmem_cache_create", 0);
+   BUG_ON(!memcg_kmem_cache_create_wq);
+ #endif
+ 
 -  hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
 +  cpuhp_setup_state_nocalls(CPUHP_MM_MEMCQ_DEAD, "mm/memctrl:dead", NULL,
 +memcg_hotplug_cpu_dead);
  
for_each_possible_cpu(cpu)
INIT_WORK(_cpu_ptr(_stock, cpu)->work,


linux-next: manual merge of the akpm-current tree with the tip tree

2016-11-13 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  mm/memcontrol.c

between commit:

  308167fcb330 ("mm/memcg: Convert to hotplug state machine")

from the tip tree and commit:

  2558c318449d ("mm: memcontrol: use special workqueue for creating per-memcg 
caches")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc mm/memcontrol.c
index 6c2043509fb5,91dfc7c5ce8f..
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@@ -5774,8 -5785,18 +5776,19 @@@ static int __init mem_cgroup_init(void
  {
int cpu, node;
  
+ #ifndef CONFIG_SLOB
+   /*
+* Kmem cache creation is mostly done with the slab_mutex held,
+* so use a special workqueue to avoid stalling all worker
+* threads in case lots of cgroups are created simultaneously.
+*/
+   memcg_kmem_cache_create_wq =
+   alloc_ordered_workqueue("memcg_kmem_cache_create", 0);
+   BUG_ON(!memcg_kmem_cache_create_wq);
+ #endif
+ 
 -  hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
 +  cpuhp_setup_state_nocalls(CPUHP_MM_MEMCQ_DEAD, "mm/memctrl:dead", NULL,
 +memcg_hotplug_cpu_dead);
  
for_each_possible_cpu(cpu)
INIT_WORK(_cpu_ptr(_stock, cpu)->work,


linux-next: manual merge of the akpm-current tree with the tip tree

2016-07-28 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/x86/include/asm/thread_info.h

between commit:

  609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")

from the tip tree and commit:

  58f9594bd42f ("signal: consolidate {TS,TLF}_RESTORE_SIGMASK code")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/include/asm/thread_info.h
index d4b0fd24a63e,b45ffdda3549..
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@@ -263,35 -219,8 +263,11 @@@ static inline int arch_within_stack_fra
   * have to worry about atomic accesses.
   */
  #define TS_COMPAT 0x0002  /* 32bit syscall active (64BIT)*/
 +#ifdef CONFIG_COMPAT
 +#define TS_I386_REGS_POKED0x0004  /* regs poked by 32-bit ptracer */
 +#endif
- #define TS_RESTORE_SIGMASK0x0008  /* restore signal mask in do_signal() */
  
  #ifndef __ASSEMBLY__
- #define HAVE_SET_RESTORE_SIGMASK  1
- static inline void set_restore_sigmask(void)
- {
-   struct thread_info *ti = current_thread_info();
-   ti->status |= TS_RESTORE_SIGMASK;
-   WARN_ON(!test_bit(TIF_SIGPENDING, (unsigned long *)>flags));
- }
- static inline void clear_restore_sigmask(void)
- {
-   current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
- }
- static inline bool test_restore_sigmask(void)
- {
-   return current_thread_info()->status & TS_RESTORE_SIGMASK;
- }
- static inline bool test_and_clear_restore_sigmask(void)
- {
-   struct thread_info *ti = current_thread_info();
-   if (!(ti->status & TS_RESTORE_SIGMASK))
-   return false;
-   ti->status &= ~TS_RESTORE_SIGMASK;
-   return true;
- }
  
  static inline bool in_ia32_syscall(void)
  {


linux-next: manual merge of the akpm-current tree with the tip tree

2016-07-28 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/x86/include/asm/thread_info.h

between commit:

  609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code")

from the tip tree and commit:

  58f9594bd42f ("signal: consolidate {TS,TLF}_RESTORE_SIGMASK code")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/x86/include/asm/thread_info.h
index d4b0fd24a63e,b45ffdda3549..
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@@ -263,35 -219,8 +263,11 @@@ static inline int arch_within_stack_fra
   * have to worry about atomic accesses.
   */
  #define TS_COMPAT 0x0002  /* 32bit syscall active (64BIT)*/
 +#ifdef CONFIG_COMPAT
 +#define TS_I386_REGS_POKED0x0004  /* regs poked by 32-bit ptracer */
 +#endif
- #define TS_RESTORE_SIGMASK0x0008  /* restore signal mask in do_signal() */
  
  #ifndef __ASSEMBLY__
- #define HAVE_SET_RESTORE_SIGMASK  1
- static inline void set_restore_sigmask(void)
- {
-   struct thread_info *ti = current_thread_info();
-   ti->status |= TS_RESTORE_SIGMASK;
-   WARN_ON(!test_bit(TIF_SIGPENDING, (unsigned long *)>flags));
- }
- static inline void clear_restore_sigmask(void)
- {
-   current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
- }
- static inline bool test_restore_sigmask(void)
- {
-   return current_thread_info()->status & TS_RESTORE_SIGMASK;
- }
- static inline bool test_and_clear_restore_sigmask(void)
- {
-   struct thread_info *ti = current_thread_info();
-   if (!(ti->status & TS_RESTORE_SIGMASK))
-   return false;
-   ti->status &= ~TS_RESTORE_SIGMASK;
-   return true;
- }
  
  static inline bool in_ia32_syscall(void)
  {


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2016-06-18 Thread Manfred Spraul

Hi,

On 06/15/2016 07:23 AM, Stephen Rothwell wrote:

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

   ipc/sem.c

between commit:

   33ac279677dc ("locking/barriers: Introduce smp_acquire__after_ctrl_dep()")

from the tip tree and commit:

   a1c58ea067cb ("ipc/sem.c: Fix complex_count vs. simple op race")

from the akpm-current tree.

Just in case, I have created a rediff of my patch against -tip.
And the patch with hysteresis would be ready as well.

I will send both patches.

More testers would be welcome, I can only test it on my laptop.

--
Manfred


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2016-06-18 Thread Manfred Spraul

Hi,

On 06/15/2016 07:23 AM, Stephen Rothwell wrote:

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

   ipc/sem.c

between commit:

   33ac279677dc ("locking/barriers: Introduce smp_acquire__after_ctrl_dep()")

from the tip tree and commit:

   a1c58ea067cb ("ipc/sem.c: Fix complex_count vs. simple op race")

from the akpm-current tree.

Just in case, I have created a rediff of my patch against -tip.
And the patch with hysteresis would be ready as well.

I will send both patches.

More testers would be welcome, I can only test it on my laptop.

--
Manfred


linux-next: manual merge of the akpm-current tree with the tip tree

2016-06-14 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  ipc/sem.c

between commit:

  33ac279677dc ("locking/barriers: Introduce smp_acquire__after_ctrl_dep()")

from the tip tree and commit:

  a1c58ea067cb ("ipc/sem.c: Fix complex_count vs. simple op race")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc ipc/sem.c
index ae72b3cddc8d,11d9e605a619..
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@@ -260,13 -267,20 +267,10 @@@ static void sem_rcu_free(struct rcu_hea
  }
  
  /*
-  * Wait until all currently ongoing simple ops have completed.
 - * spin_unlock_wait() and !spin_is_locked() are not memory barriers, they
 - * are only control barriers.
 - * The code must pair with spin_unlock(>lock) or
 - * spin_unlock(_perm.lock), thus just the control barrier is insufficient.
 - *
 - * smp_rmb() is sufficient, as writes cannot pass the control barrier.
 - */
 -#define ipc_smp_acquire__after_spin_is_unlocked() smp_rmb()
 -
 -/*
+  * Enter the mode suitable for non-simple operations:
   * Caller must own sem_perm.lock.
-  * New simple ops cannot start, because simple ops first check
-  * that sem_perm.lock is free.
-  * that a) sem_perm.lock is free and b) complex_count is 0.
   */
- static void sem_wait_array(struct sem_array *sma)
+ static void complexmode_enter(struct sem_array *sma)
  {
int i;
struct sem *sem;


linux-next: manual merge of the akpm-current tree with the tip tree

2016-06-14 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  ipc/sem.c

between commit:

  33ac279677dc ("locking/barriers: Introduce smp_acquire__after_ctrl_dep()")

from the tip tree and commit:

  a1c58ea067cb ("ipc/sem.c: Fix complex_count vs. simple op race")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc ipc/sem.c
index ae72b3cddc8d,11d9e605a619..
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@@ -260,13 -267,20 +267,10 @@@ static void sem_rcu_free(struct rcu_hea
  }
  
  /*
-  * Wait until all currently ongoing simple ops have completed.
 - * spin_unlock_wait() and !spin_is_locked() are not memory barriers, they
 - * are only control barriers.
 - * The code must pair with spin_unlock(>lock) or
 - * spin_unlock(_perm.lock), thus just the control barrier is insufficient.
 - *
 - * smp_rmb() is sufficient, as writes cannot pass the control barrier.
 - */
 -#define ipc_smp_acquire__after_spin_is_unlocked() smp_rmb()
 -
 -/*
+  * Enter the mode suitable for non-simple operations:
   * Caller must own sem_perm.lock.
-  * New simple ops cannot start, because simple ops first check
-  * that sem_perm.lock is free.
-  * that a) sem_perm.lock is free and b) complex_count is 0.
   */
- static void sem_wait_array(struct sem_array *sma)
+ static void complexmode_enter(struct sem_array *sma)
  {
int i;
struct sem *sem;


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2016-04-29 Thread Ingo Molnar

* Stephen Rothwell  wrote:

> Hi Andrew,
> 
> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   include/linux/efi.h
> 
> between commit:
> 
>   2c23b73c2d02 ("Ard Biesheuvel ")
> 
> from the tip tree and commit:
> 
>   9f2c36a7b097 ("include/linux/efi.h: redefine type, constant, macro from 
> generic code")
> 
> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Btw., while looking at this, I noticed that akpm-current introduced this 
namespace 
collision:

include/acpi/acconfig.h:#define UUID_STRING_LENGTH  36  /* Total length 
of a UUID string */
include/linux/uuid.h:#defineUUID_STRING_LEN 36

I suspect the include/acpi/acconfig.h define should be renamed:

UUID_STRING_LENGTH -> ACPI_UUID_STRING_LENGTH
UUID_BUFFER_LENGTH -> ACPI_UUID_BUFFER_LENGTH

... before the collision causes any trouble.

Thanks,

Ingo


Re: linux-next: manual merge of the akpm-current tree with the tip tree

2016-04-29 Thread Ingo Molnar

* Stephen Rothwell  wrote:

> Hi Andrew,
> 
> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   include/linux/efi.h
> 
> between commit:
> 
>   2c23b73c2d02 ("Ard Biesheuvel ")
> 
> from the tip tree and commit:
> 
>   9f2c36a7b097 ("include/linux/efi.h: redefine type, constant, macro from 
> generic code")
> 
> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Btw., while looking at this, I noticed that akpm-current introduced this 
namespace 
collision:

include/acpi/acconfig.h:#define UUID_STRING_LENGTH  36  /* Total length 
of a UUID string */
include/linux/uuid.h:#defineUUID_STRING_LEN 36

I suspect the include/acpi/acconfig.h define should be renamed:

UUID_STRING_LENGTH -> ACPI_UUID_STRING_LENGTH
UUID_BUFFER_LENGTH -> ACPI_UUID_BUFFER_LENGTH

... before the collision causes any trouble.

Thanks,

Ingo


  1   2   >