Excerpts from Christopher M. Riedl's message of July 1, 2021 3:28 pm:
> On Wed Jun 30, 2021 at 11:15 PM CDT, Nicholas Piggin wrote:
>> Excerpts from Christopher M. Riedl's message of July 1, 2021 1:48 pm:
>> > On Sun Jun 20, 2021 at 10:13 PM CDT, Daniel Axtens wrote:
>> >> "Christopher M. Riedl" <c...@linux.ibm.com> writes:
>> >>
>> >> > Switching to a different mm with Hash translation causes SLB entries to
>> >> > be preloaded from the current thread_info. This reduces SLB faults, for
>> >> > example when threads share a common mm but operate on different address
>> >> > ranges.
>> >> >
>> >> > Preloading entries from the thread_info struct may not always be
>> >> > appropriate - such as when switching to a temporary mm. Introduce a new
>> >> > boolean in mm_context_t to skip the SLB preload entirely. Also move the
>> >> > SLB preload code into a separate function since switch_slb() is already
>> >> > quite long. The default behavior (preloading SLB entries from the
>> >> > current thread_info struct) remains unchanged.
>> >> >
>> >> > Signed-off-by: Christopher M. Riedl <c...@linux.ibm.com>
>> >> >
>> >> > ---
>> >> >
>> >> > v4:  * New to series.
>> >> > ---
>> >> >  arch/powerpc/include/asm/book3s/64/mmu.h |  3 ++
>> >> >  arch/powerpc/include/asm/mmu_context.h   | 13 ++++++
>> >> >  arch/powerpc/mm/book3s64/mmu_context.c   |  2 +
>> >> >  arch/powerpc/mm/book3s64/slb.c           | 56 ++++++++++++++----------
>> >> >  4 files changed, 50 insertions(+), 24 deletions(-)
>> >> >
>> >> > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>> >> > b/arch/powerpc/include/asm/book3s/64/mmu.h
>> >> > index eace8c3f7b0a1..b23a9dcdee5af 100644
>> >> > --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> >> > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> >> > @@ -130,6 +130,9 @@ typedef struct {
>> >> >         u32 pkey_allocation_map;
>> >> >         s16 execute_only_pkey; /* key holding execute-only protection */
>> >> >  #endif
>> >> > +
>> >> > +       /* Do not preload SLB entries from thread_info during 
>> >> > switch_slb() */
>> >> > +       bool skip_slb_preload;
>> >> >  } mm_context_t;
>> >> >  
>> >> >  static inline u16 mm_ctx_user_psize(mm_context_t *ctx)
>> >> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
>> >> > b/arch/powerpc/include/asm/mmu_context.h
>> >> > index 4bc45d3ed8b0e..264787e90b1a1 100644
>> >> > --- a/arch/powerpc/include/asm/mmu_context.h
>> >> > +++ b/arch/powerpc/include/asm/mmu_context.h
>> >> > @@ -298,6 +298,19 @@ static inline int arch_dup_mmap(struct mm_struct 
>> >> > *oldmm,
>> >> >         return 0;
>> >> >  }
>> >> >  
>> >> > +#ifdef CONFIG_PPC_BOOK3S_64
>> >> > +
>> >> > +static inline void skip_slb_preload_mm(struct mm_struct *mm)
>> >> > +{
>> >> > +       mm->context.skip_slb_preload = true;
>> >> > +}
>> >> > +
>> >> > +#else
>> >> > +
>> >> > +static inline void skip_slb_preload_mm(struct mm_struct *mm) {}
>> >> > +
>> >> > +#endif /* CONFIG_PPC_BOOK3S_64 */
>> >> > +
>> >> >  #include <asm-generic/mmu_context.h>
>> >> >  
>> >> >  #endif /* __KERNEL__ */
>> >> > diff --git a/arch/powerpc/mm/book3s64/mmu_context.c 
>> >> > b/arch/powerpc/mm/book3s64/mmu_context.c
>> >> > index c10fc8a72fb37..3479910264c59 100644
>> >> > --- a/arch/powerpc/mm/book3s64/mmu_context.c
>> >> > +++ b/arch/powerpc/mm/book3s64/mmu_context.c
>> >> > @@ -202,6 +202,8 @@ int init_new_context(struct task_struct *tsk, 
>> >> > struct mm_struct *mm)
>> >> >         atomic_set(&mm->context.active_cpus, 0);
>> >> >         atomic_set(&mm->context.copros, 0);
>> >> >  
>> >> > +       mm->context.skip_slb_preload = false;
>> >> > +
>> >> >         return 0;
>> >> >  }
>> >> >  
>> >> > diff --git a/arch/powerpc/mm/book3s64/slb.c 
>> >> > b/arch/powerpc/mm/book3s64/slb.c
>> >> > index c91bd85eb90e3..da0836cb855af 100644
>> >> > --- a/arch/powerpc/mm/book3s64/slb.c
>> >> > +++ b/arch/powerpc/mm/book3s64/slb.c
>> >> > @@ -441,10 +441,39 @@ static void slb_cache_slbie_user(unsigned int 
>> >> > index)
>> >> >         asm volatile("slbie %0" : : "r" (slbie_data));
>> >> >  }
>> >> >  
>> >> > +static void preload_slb_entries(struct task_struct *tsk, struct 
>> >> > mm_struct *mm)
>> >> Should this be explicitly inline or even __always_inline? I'm thinking
>> >> switch_slb is probably a fairly hot path on hash?
>> > 
>> > Yes absolutely. I'll make this change in v5.
>> > 
>> >>
>> >> > +{
>> >> > +       struct thread_info *ti = task_thread_info(tsk);
>> >> > +       unsigned char i;
>> >> > +
>> >> > +       /*
>> >> > +        * We gradually age out SLBs after a number of context switches 
>> >> > to
>> >> > +        * reduce reload overhead of unused entries (like we do with 
>> >> > FP/VEC
>> >> > +        * reload). Each time we wrap 256 switches, take an entry out 
>> >> > of the
>> >> > +        * SLB preload cache.
>> >> > +        */
>> >> > +       tsk->thread.load_slb++;
>> >> > +       if (!tsk->thread.load_slb) {
>> >> > +               unsigned long pc = KSTK_EIP(tsk);
>> >> > +
>> >> > +               preload_age(ti);
>> >> > +               preload_add(ti, pc);
>> >> > +       }
>> >> > +
>> >> > +       for (i = 0; i < ti->slb_preload_nr; i++) {
>> >> > +               unsigned char idx;
>> >> > +               unsigned long ea;
>> >> > +
>> >> > +               idx = (ti->slb_preload_tail + i) % SLB_PRELOAD_NR;
>> >> > +               ea = (unsigned long)ti->slb_preload_esid[idx] << 
>> >> > SID_SHIFT;
>> >> > +
>> >> > +               slb_allocate_user(mm, ea);
>> >> > +       }
>> >> > +}
>> >> > +
>> >> >  /* Flush all user entries from the segment table of the current 
>> >> > processor. */
>> >> >  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>> >> >  {
>> >> > -       struct thread_info *ti = task_thread_info(tsk);
>> >> >         unsigned char i;
>> >> >  
>> >> >         /*
>> >> > @@ -502,29 +531,8 @@ void switch_slb(struct task_struct *tsk, struct 
>> >> > mm_struct *mm)
>> >> >  
>> >> >         copy_mm_to_paca(mm);
>> >> >  
>> >> > -       /*
>> >> > -        * We gradually age out SLBs after a number of context switches 
>> >> > to
>> >> > -        * reduce reload overhead of unused entries (like we do with 
>> >> > FP/VEC
>> >> > -        * reload). Each time we wrap 256 switches, take an entry out 
>> >> > of the
>> >> > -        * SLB preload cache.
>> >> > -        */
>> >> > -       tsk->thread.load_slb++;
>> >> > -       if (!tsk->thread.load_slb) {
>> >> > -               unsigned long pc = KSTK_EIP(tsk);
>> >> > -
>> >> > -               preload_age(ti);
>> >> > -               preload_add(ti, pc);
>> >> > -       }
>> >> > -
>> >> > -       for (i = 0; i < ti->slb_preload_nr; i++) {
>> >> > -               unsigned char idx;
>> >> > -               unsigned long ea;
>> >> > -
>> >> > -               idx = (ti->slb_preload_tail + i) % SLB_PRELOAD_NR;
>> >> > -               ea = (unsigned long)ti->slb_preload_esid[idx] << 
>> >> > SID_SHIFT;
>> >> > -
>> >> > -               slb_allocate_user(mm, ea);
>> >> > -       }
>> >> > +       if (!mm->context.skip_slb_preload)
>> >> > +               preload_slb_entries(tsk, mm);
>> >>
>> >> Should this be wrapped in likely()?
>> > 
>> > Seems like a good idea - yes.
>> > 
>> >>
>> >> >  
>> >> >         /*
>> >> >          * Synchronize slbmte preloads with possible subsequent user 
>> >> > memory
>> >>
>> >> Right below this comment is the isync. It seems to be specifically
>> >> concerned with synchronising preloaded slbs. Do you need it if you are
>> >> skipping SLB preloads?
>> >>
>> >> It's probably not a big deal to have an extra isync in the fairly rare
>> >> path when we're skipping preloads, but I thought I'd check.
>> > 
>> > I don't _think_ we need the `isync` if we are skipping the SLB preloads,
>> > but then again it was always in the code-path before. If someone can
>> > make a compelling argument to drop it when not preloading SLBs I will,
>> > otherwise (considering some of the other non-obvious things I stepped
>> > into with the Hash code) I will keep it here for now.
>>
>> The ISA says slbia wants an isync afterward, so we probably should keep
>> it. The comment is a bit misleading in that case.
>>
>> Why isn't preloading appropriate for a temporary mm?
> 
> The preloaded entries come from the thread_info struct which isn't
> necessarily related to the temporary mm at all. I saw SLB multihits
> while testing this series with my LKDTM test where the "patching
> address" (userspace address for the temporary mapping w/
> write-permissions) ends up in a thread's preload list and then we
> explicitly insert it again in map_patch() when trying to patch. At that
> point the SLB multihit triggers.

Hmm, so what if we use a mm, take some SLB faults then unuse it and
use a different one? I wonder if kthread_use_mm has existing problems
with this incorrect SLB preloading. Quite possibly. We should clear
the preload whenever mm changes I think. That should cover this as
well.

Thanks,
Nick

Reply via email to