On Tue, 2022-08-09 at 05:51 +0000, Christophe Leroy wrote: > Le 09/08/2022 à 04:44, Russell Currey a écrit : > > The Hash MMU already supports XOM (i.e. mmap with PROT_EXEC only) > > through the execute-only pkey. A PROT_EXEC-only mapping will > > actually > > map to RX, and then the pkey will be applied on top of it. > > I don't think XOM is a commonly understood accronym. Maybe the first > time you use it it'd be better to say something like: > > The Hash MMU already supports execute-only memory (XOM)
Yes, that's better. > > When you say that Hash MMU supports it through the execute-only pkey, > does it mean that it is taken into account automatically at mmap > time, > or does the userspace app has to do something special to use the key > ? > If it is the second, it means that depending on whether you are radix > or > not, you must do something different ? Is that expected ? It happens at mmap time, see do_mmap() in mm/mmap.c (and similar for mprotect). That calls into execute_only_pkey() which can return something on x86 & Hash, and if it does that pkey gets used. The userspace process doesn't have to do anything, it's transparent. So there's no difference in program behaviour switching between Hash/Radix - at least in the basic cases I've tested. > > > > > Radix doesn't have pkeys, but it does have execute permissions > > built-in > > to the MMU, so all we have to do to support XOM is expose it. > > > > Signed-off-by: Russell Currey <rus...@russell.cc> > > --- > > v3: Incorporate Aneesh's suggestions, leave protection_map > > untouched > > Basic test: > > https://github.com/ruscur/junkcode/blob/main/mmap_test.c > > > > arch/powerpc/include/asm/book3s/64/pgtable.h | 2 ++ > > arch/powerpc/mm/book3s64/pgtable.c | 11 +++++++++-- > > arch/powerpc/mm/fault.c | 6 +++++- > > 3 files changed, 16 insertions(+), 3 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h > > b/arch/powerpc/include/asm/book3s/64/pgtable.h > > index 392ff48f77df..486902aff040 100644 > > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h > > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h > > @@ -151,6 +151,8 @@ > > #define PAGE_COPY_X __pgprot(_PAGE_BASE | _PAGE_READ | > > _PAGE_EXEC) > > #define PAGE_READONLY __pgprot(_PAGE_BASE | _PAGE_READ) > > #define PAGE_READONLY_X __pgprot(_PAGE_BASE | _PAGE_READ | > > _PAGE_EXEC) > > +/* Radix only, Hash uses PAGE_READONLY_X + execute-only pkey > > instead */ > > +#define PAGE_EXECONLY __pgprot(_PAGE_BASE | _PAGE_EXEC) > > > > /* Permission masks used for kernel mappings */ > > #define PAGE_KERNEL __pgprot(_PAGE_BASE | _PAGE_KERNEL_RW) > > diff --git a/arch/powerpc/mm/book3s64/pgtable.c > > b/arch/powerpc/mm/book3s64/pgtable.c > > index 7b9966402b25..62f63d344596 100644 > > --- a/arch/powerpc/mm/book3s64/pgtable.c > > +++ b/arch/powerpc/mm/book3s64/pgtable.c > > @@ -553,8 +553,15 @@ EXPORT_SYMBOL_GPL(memremap_compat_align); > > > > pgprot_t vm_get_page_prot(unsigned long vm_flags) > > { > > - unsigned long prot = pgprot_val(protection_map[vm_flags & > > - > > (VM_READ|VM_WRITE|VM_EXEC|VM_ > > SHARED)]); > > + unsigned long prot; > > + > > + /* Radix supports execute-only, but protection_map maps X - > > > RX */ > > + if (radix_enabled() && ((vm_flags & > > (VM_READ|VM_WRITE|VM_EXEC)) == VM_EXEC)) { > > Maybe use VM_ACCESS_FLAGS ? I was looking for something like that but only checked powerpc, thanks. > > > + prot = pgprot_val(PAGE_EXECONLY); > > + } else { > > + prot = pgprot_val(protection_map[vm_flags & > > + > > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]); > > + } > > > > if (vm_flags & VM_SAO) > > prot |= _PAGE_SAO; > > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > > index 014005428687..59e4cbcf3109 100644 > > --- a/arch/powerpc/mm/fault.c > > +++ b/arch/powerpc/mm/fault.c > > @@ -270,7 +270,11 @@ static bool access_error(bool is_write, bool > > is_exec, struct vm_area_struct *vma > > return false; > > } > > > > - if (unlikely(!vma_is_accessible(vma))) > > + /* On Radix, a read fault could be from PROT_NONE or > > PROT_EXEC */ > > + if (unlikely(radix_enabled() && !(vma->vm_flags & > > VM_READ))) > > + return true; > > Why do you need the radix_enabled() here ? > Even if it doesn't fault directly, reading a non readable area is > still > an error and should be handled as such, even on hardware that will > not > generate a fault for it at the first place. So I'd just do: > > if (!(vma->vm_flags & VM_READ))) > return true; > I don't think we need it, just concerned I might break something. I can do that. > > + /* Check for a PROT_NONE fault on other MMUs */ > > + else if (unlikely(!vma_is_accessible(vma))) > > return true; > > /* > > * We should ideally do the vma pkey access check here. But > > in the > > Don't use an if/else construct, there is no other 'else' in that > function, or in similar functions like bad_kernel_fault() for > instance. > > So leave the !vma_is_accessible(vma) untouched and add your check as > a > standalone check before or after it. I think checking vma_is_accessible(vma) is redundant if we're checking for a read fault. It doesn't satisfy the Radix exec-only case because PROT_EXEC will be set, but as far as I can tell other MMUs don't have a no-read mode other than PROT_NONE. Unless I'm missing something, checking if PROT_READ is there should be enough.