The MCS-PDR locks have an optimization built in where most vcores ensure the lockholder runs. The other style is to ensure whoever is directly ahead in line (the predecessor) runs, which we fall back to in some tricky corner cases. That all is fine.
In the unlock case, the lockholder needs to ensure whoever is next in line runs (the lockholder's successor). Waiting on a_tail requires that everyone in line is ensuring their predecessor runs, which is not what happens by default. By clearing the lockholder_vcoreid, we can fall back to this "get out of corner cases by following the chain" approach. I've been chasing this one off and on for a couple years. I managed to recreate it once or twice and was able to peak at the userspace contexts. I could see all but one of them were calling ensure_vcore_runs(2), and VC 2 was calling ensure_vcore_runs(1). Two other vcores were preempted. I could see VC 2 was in an unlock. There were no syscalls coming from that process. It was actually simple after that. What happened was that a vcore signed up for the lock (L391), but hadn't set pred->next yet (L396). Then it gets preempted. VC 2 was its pred, and it acquired the lock with no issues. When it went to unlock, it needed to ensure it's successor was running. VC 2 was the lockholder_vcoreid. Everyone in the chain behind the preempted VC kept ensuring 2 ran. No one ensured the preempted VC ran. Signed-off-by: Barret Rhoden <[email protected]> --- user/parlib/mcs.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/user/parlib/mcs.c b/user/parlib/mcs.c index 7f03999dae0a..9c5a93459108 100644 --- a/user/parlib/mcs.c +++ b/user/parlib/mcs.c @@ -413,6 +413,7 @@ void __mcs_pdr_lock(struct mcs_pdr_lock *lock, struct mcs_pdr_qnode *qnode) * so they can hand us the lock. */ if (vcore_is_preempted(pred_vcoreid) || seq != __procinfo.coremap_seqctr) { + /* Note that we don't normally ensure our *pred* runs. */ if (lock->lockholder_vcoreid == MCSPDR_NO_LOCKHOLDER || lock->lockholder_vcoreid == vcore_id()) ensure_vcore_runs(pred_vcoreid); @@ -451,7 +452,13 @@ void __mcs_pdr_unlock(struct mcs_pdr_lock *lock, struct mcs_pdr_qnode *qnode) while (qnode->next == 0) { /* We need to get our next to run, but we don't know who they are. * If we make sure a tail is running, that will percolate up to make - * sure our qnode->next is running */ + * sure our qnode->next is running. + * + * But first, we need to tell everyone that there is no specific + * lockholder. lockholder_vcoreid is a short-circuit on the "walk + * the chain" PDR. Normally, that's okay. But now we need to make + * sure everyone is walking the chain from a_tail up to our pred. */ + lock->lockholder_vcoreid = MCSPDR_NO_LOCKHOLDER; ensure_vcore_runs(a_tail_vcoreid); cpu_relax(); } -- 2.6.0.rc2.230.g3dd15c0 -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
