On Wed, Jul 21, 2021, Will Deacon wrote:
> > For the page tables liveliness, KVM implements mmu_notifier_ops.release, 
> > which is
> > invoked at the beginning of exit_mmap(), before the page tables are freed.  
> > In
> > its implementation, KVM takes mmu_lock and zaps all its shadow page tables, 
> > a.k.a.
> > the stage2 tables in KVM arm64.  The flow in question, 
> > get_user_mapping_size(),
> > also runs under mmu_lock, and so effectively blocks exit_mmap() and thus is
> > guaranteed to run with live userspace tables.
> 
> Unless I missed a case, exit_mmap() only runs when mm_struct::mm_users drops
> to zero, right?

Yep.

> The vCPU tasks should hold references to that afaict, so I don't think it
> should be possible for exit_mmap() to run while there are vCPUs running with
> the corresponding page-table.

Ah, right, I was thinking of non-KVM code that operated on the page tables 
without
holding a reference to mm_users.

> > Looking at the arm64 code, one thing I'm not clear on is whether arm64 
> > correctly
> > handles the case where exit_mmap() wins the race.  The invalidate_range 
> > hooks will
> > still be called, so userspace page tables aren't a problem, but
> > kvm_arch_flush_shadow_all() -> kvm_free_stage2_pgd() nullifies mmu->pgt 
> > without
> > any additional notifications that I see.  x86 deals with this by ensuring 
> > its
> > top-level TDP entry (stage2 equivalent) is valid while the page fault 
> > handler is
> > running.
> 
> But the fact that x86 handles this race has me worried. What am I missing?

I don't think you're missing anything.  I forgot that KVM_RUN would require an
elevated mm_users.  x86 does handle the impossible race, but that's 
coincidental.
The extra protections in x86 are to deal with other cases where a vCPU's 
top-level
SPTE can be invalidated while the vCPU is running.
_______________________________________________
kvmarm mailing list
[email protected]
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Reply via email to