>>>>> On Sun, 10 Apr 2005 08:43:24 +0200, Ingo Molnar <[EMAIL PROTECTED]> said:

  Ingo> * David S. Miller <[EMAIL PROTECTED]> wrote:

  >> > Yes, of course.  The deadlock was due to context-switching, not
  >> > switch_mm() per se.  Hopefully someone else beats me to
  >> remembering > the details before Monday.

  >> Sparc64 has a deadlock because we hold mm->page_table_lock during
  >> switch_mm().  I bet IA64 did something similar, as I remember it
  >> had a very similar locking issue in this area.

  >> So the deadlock was, we held the runqueue locks over switch_mm(),
  >> switch_mm() spins on mm->page_table_lock, the cpu which does have
  >> mm-> page_table_lock tries to do a wakeup on the first cpu's
  >> mm-> runqueue.
  >> Classic AB-BA deadlock.

  Ingo> yeah, i can see that happening - holding the runqueue lock and
  Ingo> enabling interrupts. (it's basically never safe to enable irqs
  Ingo> with the runqueue lock held.)

  Ingo> the patch drops both the runqueue lock and enables interrupts,
  Ingo> so this particular issue should not trigger.

I had to refresh my memory with a quick Google search that netted [1]
(look for "Disable interrupts during context switch").  Actually, it
wasn't really a deadlock, but rather a livelock, since a CPU got stuck
on an infinite page-not-present loop.

Fundamentally, the issue was that doing the switch_mm() and
switch_to() with interrupts enabled opened a window during which you
could get a call to flush_tlb_mm() (as a result of an IPI).  This, in
turn, could end up activating the wrong MMU-context, since the action
of flush_tlb_mm() depends on the value of current->active_mm.  The
problematic sequence was:

1) schedule() calls switch_mm() which calls activate_context() to
   switch to the new address-space
2) IPI comes in and flush_tlb_mm(mm) gets called
3) "current" still points to old task and if "current->active_mm == mm",
   activate_mm() is called for the old address-space, resetting the
   address-space back to that of the old task

Now, Ingo says that the order is reversed with his patch, i.e.,
switch_mm() happens after switch_to().  That means flush_tlb_mm() may
now see a current->active_mm which hasn't really been activated yet.
That should be OK since it would just mean that we'd do an early (and
duplicate) activate_context().  While it does not give me a warm and
fuzzy feeling to have this inconsistent state be observable by
interrupt-handlers (and, in particular, IPI-handlers), I don't see any
problem with it off hand.

        --david

[1] http://www.gelato.unsw.edu.au/linux-ia64/0307/6109.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to