Re: [RFC PATCH 1/2] x86: Don't let pgprot_modify() change the page encryption bit

Andy Lutomirski Tue, 10 Sep 2019 21:18:53 -0700

On Tue, Sep 10, 2019 at 12:26 PM Thomas Hellström (VMware)
<[email protected]> wrote:
>
> On 9/10/19 6:11 PM, Andy Lutomirski wrote:
> >
> >> On Sep 5, 2019, at 8:24 AM, Christoph Hellwig <[email protected]> wrote:
> >>
> >>> On Thu, Sep 05, 2019 at 05:21:24PM +0200, Thomas Hellström (VMware) wrote:
> >>>> On 9/5/19 4:15 PM, Dave Hansen wrote:
> >>>> Hi Thomas,
> >>>>
> >>>> Thanks for the second batch of patches!  These look much improved on all
> >>>> fronts.
> >>> Yes, although the TTM functionality isn't in yet. Hopefully we won't have 
> >>> to
> >>> bother you with those though, since this assumes TTM will be using the dma
> >>> API.
> >> Please take a look at dma_mmap_prepare and dma_mmap_fault in this
> >> branch:
> >>
> >>     
> >> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-mmap-improvements
> >>
> >> they should allow to fault dma api pages in the page fault handler.  But
> >> this is totally hot off the press and not actually tested for the last
> >> few patches.  Note that I've also included your two patches from this
> >> series to handle SEV.
> > I read that patch, and it seems like you’ve built in the assumption that 
> > all pages in the mapping use identical protection or, if not, that the same 
> > fake vma hack that TTM already has is used to fudge around it.  Could it be 
> > reworked slightly to avoid this?
> >
> > I wonder if it’s a mistake to put the encryption bits in vm_page_prot at 
> > all.
>
>  From my POW, the encryption bits behave quite similar in behaviour to
> the caching mode bits, and they're also in vm_page_prot. They're the
> reason TTM needs to modify the page protection in the fault handler in
> the first place.
>
> The problem seen in TTM is that we want to be able to change the
> vm_page_prot from the fault handler, but it's problematic since we have
> the mmap_sem typically only in read mode. Hence the fake vma hack. From
> what I can tell it's reasonably well-behaved, since pte_modify() skips
> the bits TTM updates, so mprotect() and mremap() works OK. I think
> split_huge_pmd may run into trouble, but we don't support it (yet) with
> TTM.


One thing I'm confused about: does TTM move individual pages between
main memory and device memory or does it move whole mappings?  If it
moves individual pages, then a single mapping could have PTEs from
dma_alloc_coherent() space and from PCI space.  How can this work with
vm_page_prot?  I guess you might get lucky and have both have the same
protection bits, but that seems like an unfortunate thing to rely on.

As a for-real example, take a look at arch/x86/entry/vdso/vma.c.  The
"vvar" VMA contains multiple pages that are backed by different types
of memory.  There's a page of ordinary kernel memory.  Historically
there was a page of genuine MMIO memory, but that's gone now.  If the
system is a SEV guest with pvclock enabled, then there's a page of
decrypted memory.   They all share a VMA, they're instantiated in
.fault, and there is no hackery involved.  Look at vvar_fault().

So, Christoph, can't you restructure your code a bit to compute the
caching and encryption state per page instead of once in
dma_mmap_prepare() and insert the pages accordingly?  You might need
to revert 6d958546ff611c9ae09b181e628c1c5d5da5ebda depending on
exactly how you structure it.

>
> We could probably get away with a WRITE_ONCE() update of the
> vm_page_prot before taking the page table lock since
>
> a) We're locking out all other writers.
> b) We can't race with another fault to the same vma since we hold an
> address space lock ("buffer object reservation")
> c) When we need to update there are no valid page table entries in the
> vma, since it only happens directly after mmap(), or after an
> unmap_mapping_range() with the same address space lock. When another
> reader (for example split_huge_pmd()) sees a valid page table entry, it
> also sees the new page protection and things are fine.
>
> But that would really be a special case. To solve this properly we'd
> probably need an additional lock to protect the vm_flags and
> vm_page_prot, taken after mmap_sem and i_mmap_lock.
>

This is all horrible IMO.

Re: [RFC PATCH 1/2] x86: Don't let pgprot_modify() change the page encryption bit

Reply via email to