Le 3/16/21 à 4:40 AM, Anup Patel a écrit :
On Tue, Mar 16, 2021 at 1:59 PM Andrew Waterman
<water...@eecs.berkeley.edu> wrote:

On Tue, Mar 16, 2021 at 12:32 AM Anup Patel <a...@brainfault.org> wrote:

On Tue, Mar 16, 2021 at 12:27 PM Jiuyang Liu <l...@jiuyang.me> wrote:

As per my understanding, we don't need to explicitly invalidate local TLB
in set_pte() or set_pet_at() because generic Linux page table management
(<linux>/mm/*) will call the appropriate flush_tlb_xyz() function after page
table updates.

I witnessed this bug in our micro-architecture: set_pte instruction is
still in the store buffer, no functions are inserting SFENCE.VMA in
the stack below, so TLB cannot witness this modification.
Here is my call stack:
set_pte
set_pte_at
map_vm_area
__vmalloc_area_node
__vmalloc_node_range
__vmalloc_node
__vmalloc_node_flags
vzalloc
n_tty_open


I don't find this call stack, what I find is (the other way around):

n_tty_open
vzalloc
__vmalloc_node
__vmalloc_node_range
__vmalloc_area_node
map_kernel_range
-> map_kernel_range_noflush
   flush_cache_vmap

Which leads to the fact that we don't have flush_cache_vmap callback implemented: shouldn't we add the sfence.vma here ? Powerpc does something similar with "ptesync" (see below) instruction that seems to do the same as sfence.vma.

ptesync: "The ptesync instruction after the Store instruction ensures that all searches of the Page Table that are performed after the ptesync instruction completes will use the value stored"

I think this is an architecture specific code, so <linux>/mm/* should
not be modified.
And spec requires SFENCE.VMA to be inserted on each modification to
TLB. So I added code here.

The generic linux/mm/* already calls the appropriate tlb_flush_xyz()
function defined in arch/riscv/include/asm/tlbflush.h

Better to have a write-barrier in set_pte().


Also, just local TLB flush is generally not sufficient because
a lot of page tables will be used across on multiple HARTs.

Yes, this is the biggest issue, in RISC-V Volume 2, Privileged Spec v.
20190608 page 67 gave a solution:

This is not an issue with RISC-V privilege spec rather it is more about
placing RISC-V fences at right locations.

Consequently, other harts must be notified separately when the
memory-management data structures have been modified. One approach is
to use
1) a local data fence to ensure local writes are visible globally,
then 2) an interprocessor interrupt to the other thread,
then 3) a local SFENCE.VMA in the interrupt handler of the remote thread,
and finally 4) signal back to originating thread that operation is
complete. This is, of course, the RISC-V analog to a TLB shootdown.

I would suggest trying approach#1.

You can include "asm/barrier.h" here and use wmb() or __smp_wmb()
in-place of local TLB flush.

wmb() doesn't suffice to order older stores before younger page-table
walks, so that might hide the problem without actually fixing it.

If we assume page-table walks as reads then mb() might be more
suitable in this case ??

ARM64 also has an explicit barrier in set_pte() implementation. They are
doing "dsb(ishst); isb()" which is an inner-shareable store barrier followed
by an instruction barrier.


Based upon Jiuyang's description, it does sound plausible that we are
missing an SFENCE.VMA (or TLB shootdown) somewhere.  But I don't
understand the situation well enough to know where that might be, or
what the best fix is.

Yes, I agree but set_pte() doesn't seem to be the right place for TLB
shootdown based on set_pte() implementations of other architectures.

I agree as "flushing" the TLB after every set_pte() would be very costly, it's better to do it once at the end of the all the updates: like in flush_cache_vmap :)

Alex


Regards,
Anup





In general, this patch didn't handle the G bit in PTE, kernel trap it
to sbi_remote_sfence_vma. do you think I should use flush_tlb_all?

Jiuyang




arch/arm/mm/mmu.c
void set_pte_at(struct mm_struct *mm, unsigned long addr,
                               pte_t *ptep, pte_t pteval)
{
         unsigned long ext = 0;

         if (addr < TASK_SIZE && pte_valid_user(pteval)) {
                 if (!pte_special(pteval))
                         __sync_icache_dcache(pteval);
                 ext |= PTE_EXT_NG;
         }

         set_pte_ext(ptep, pteval, ext);
}

arch/mips/include/asm/pgtable.h
static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
                               pte_t *ptep, pte_t pteval)
{

         if (!pte_present(pteval))
                 goto cache_sync_done;

         if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
                 goto cache_sync_done;

         __update_cache(addr, pteval);
cache_sync_done:
         set_pte(ptep, pteval);
}


Also, just local TLB flush is generally not sufficient because
a lot of page tables will be used accross on multiple HARTs.


On Tue, Mar 16, 2021 at 5:05 AM Anup Patel <a...@brainfault.org> wrote:

+Alex

On Tue, Mar 16, 2021 at 9:20 AM Jiuyang Liu <l...@jiuyang.me> wrote:

This patch inserts SFENCE.VMA after modifying PTE based on RISC-V
specification.

arch/riscv/include/asm/pgtable.h:
1. implement pte_user, pte_global and pte_leaf to check correspond
attribute of a pte_t.

Adding pte_user(), pte_global(), and pte_leaf() is fine.


2. insert SFENCE.VMA in set_pte_at based on RISC-V Volume 2, Privileged
Spec v. 20190608 page 66 and 67:
If software modifies a non-leaf PTE, it should execute SFENCE.VMA with
rs1=x0. If any PTE along the traversal path had its G bit set, rs2 must
be x0; otherwise, rs2 should be set to the ASID for which the
translation is being modified.
If software modifies a leaf PTE, it should execute SFENCE.VMA with rs1
set to a virtual address within the page. If any PTE along the traversal
path had its G bit set, rs2 must be x0; otherwise, rs2 should be set to
the ASID for which the translation is being modified.

arch/riscv/include/asm/tlbflush.h:
1. implement get_current_asid to get current program asid.
2. implement local_flush_tlb_asid to flush tlb with asid.

As per my understanding, we don't need to explicitly invalidate local TLB
in set_pte() or set_pet_at() because generic Linux page table management
(<linux>/mm/*) will call the appropriate flush_tlb_xyz() function after page
table updates. Also, just local TLB flush is generally not sufficient because
a lot of page tables will be used accross on multiple HARTs.


Signed-off-by: Jiuyang Liu <l...@jiuyang.me>
---
  arch/riscv/include/asm/pgtable.h  | 27 +++++++++++++++++++++++++++
  arch/riscv/include/asm/tlbflush.h | 12 ++++++++++++
  2 files changed, 39 insertions(+)

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index ebf817c1bdf4..5a47c60372c1 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -222,6 +222,16 @@ static inline int pte_write(pte_t pte)
         return pte_val(pte) & _PAGE_WRITE;
  }

+static inline int pte_user(pte_t pte)
+{
+       return pte_val(pte) & _PAGE_USER;
+}
+
+static inline int pte_global(pte_t pte)
+{
+       return pte_val(pte) & _PAGE_GLOBAL;
+}
+
  static inline int pte_exec(pte_t pte)
  {
         return pte_val(pte) & _PAGE_EXEC;
@@ -248,6 +258,11 @@ static inline int pte_special(pte_t pte)
         return pte_val(pte) & _PAGE_SPECIAL;
  }

+static inline int pte_leaf(pte_t pte)
+{
+       return pte_val(pte) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC);
+}
+
  /* static inline pte_t pte_rdprotect(pte_t pte) */

  static inline pte_t pte_wrprotect(pte_t pte)
@@ -358,6 +373,18 @@ static inline void set_pte_at(struct mm_struct *mm,
                 flush_icache_pte(pteval);

         set_pte(ptep, pteval);
+
+       if (pte_present(pteval)) {
+               if (pte_leaf(pteval)) {
+                       local_flush_tlb_page(addr);
+               } else {
+                       if (pte_global(pteval))
+                               local_flush_tlb_all();
+                       else
+                               local_flush_tlb_asid();
+
+               }
+       }
  }

  static inline void pte_clear(struct mm_struct *mm,
diff --git a/arch/riscv/include/asm/tlbflush.h 
b/arch/riscv/include/asm/tlbflush.h
index 394cfbccdcd9..1f9b62b3670b 100644
--- a/arch/riscv/include/asm/tlbflush.h
+++ b/arch/riscv/include/asm/tlbflush.h
@@ -21,6 +21,18 @@ static inline void local_flush_tlb_page(unsigned long addr)
  {
         __asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory");
  }
+
+static inline unsigned long get_current_asid(void)
+{
+       return (csr_read(CSR_SATP) >> SATP_ASID_SHIFT) & SATP_ASID_MASK;
+}
+
+static inline void local_flush_tlb_asid(void)
+{
+       unsigned long asid = get_current_asid();
+       __asm__ __volatile__ ("sfence.vma x0, %0" : : "r" (asid) : "memory");
+}
+
  #else /* CONFIG_MMU */
  #define local_flush_tlb_all()                  do { } while (0)
  #define local_flush_tlb_page(addr)             do { } while (0)
--
2.30.2


_______________________________________________
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

Regards,
Anup

Regards,
Anup

_______________________________________________
linux-riscv mailing list
linux-ri...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

Reply via email to