Re: [kvm-devel] EMM: disable other notifiers before register and unregister

Andrea Arcangeli Thu, 03 Apr 2008 08:30:16 -0700

On Wed, Apr 02, 2008 at 06:24:15PM -0700, Christoph Lameter wrote:
> Ok lets forget about the single theaded thing to solve the registration 
> races. As Andrea pointed out this still has ssues with other subscribed 
> subsystems (and also try_to_unmap). We could do something like what 
> stop_machine_run does: First disable all running subsystems before 
> registering a new one.
> 
> Maybe this is a possible solution.


It still doesn't solve this kernel crash.

   CPU0                         CPU1
   range_start (mmu notifier chain is empty)
   range_start returns
                                mmu_notifier_register
                                kvm_emm_stop (how kvm can ever know
                                the other cpu is in the middle of the critical 
section?)
                                kvm page fault (kvm thinks 
mmu_notifier_register serialized)
   zap ptes
   free_page mapped by spte/GRU and not pinned -> crash


There's no way the lowlevel can stop mmu_notifier_register and if
mmu_notifier_register returns, then sptes will be instantiated and
it'll corrupt memory the same way.

The seqlock was fine, what is wrong is the assumption that we can let
the lowlevel driver handle a range_end happening without range_begin
before it. The problem is that by design the lowlevel can't handle a
range_end happening without a range_begin before it. This is the core
kernel crashing problem we have (it's a kernel crashing issue only for
drivers that don't pin the pages, so XPMEM wouldn't crash but still it
would leak memory, which is a more graceful failure than random mm
corruption).

The basic trouble is that sometime range_begin/end critical sections
run outside the mmap_sem (see try_to_unmap_cluster in #v10 or even
try_to_unmap_one only in EMM-V2).

My attempt to fix this once and for all is to walk all vmas of the
"mm" inside mmu_notifier_register and take all anon_vma locks and
i_mmap_locks in virtual address order in a row. It's ok to take those
inside the mmap_sem. Supposedly if anybody will ever take a double
lock it'll do in order too. Then I can dump all the other locking and
remove the seqlock, and the driver is guaranteed there will be a
single call of range_begin followed by a single call of range_end the
whole time and no race could ever happen, and there won't be replied
calls of range_begin that would screwup a recursive semaphore
locking. The patch won't be pretty, I guess I'll vmalloc an array of
pointers to locks to reorder them. It doesn't need to be fast. Also
the locks can't go away from under us while we hold the
down_write(mmap_sem) because the vmas can be altered only with
down_write(mmap_sem) (modulo vm_start/vm_end that can be modified with
only down_read(mmap_sem) + page_table_lock like in growsdown page
faults). So it should be ok to take all those locks inside the
mmap_sem and implement a lock_vm(mm) unlock_vm(mm). I'll think more
about this hammer approach while I try to implement it...

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] EMM: disable other notifiers before register and unregister

Reply via email to