Hi Andrej, Thank you very much for taking the time to look at my comments regarding your algorithm.I am afraid I still do not understand how the race can be prevented. I am probably stuck and -- sorry to say this -- you are most likely the only one who can get me unstuck ;-). I would be grateful if you had a look at this email as well and pointed out my logical error.
Hi Adam,
After checking your example much more carefully, including the source code, I
have the impression that you are perfectly right. I would like to apologize for
my statement of the form „if there was a bug of this magnitude, we would have
already observed it“. The way I see it right now, there *is* a serious bug.
(And the fact that it hasn't been observed yet doesn't matter at all.)
I remember I got stuck by similar thoughts quite many times in the past, but
unfortunately, two years have elapsed and many thoughts (and also notes,
unfortunately) are now irreversibly lost. :-( Nevertheless, right now I am
convinced that the race condition you point out („a callback enqueued *after*
GP detection and *before* the subsequent reclamation can be invoked prematurely
with respect to a concurrent reader“) *can* really occur.
This leads to an obvious question: How is it possible that our tests and
benchmarks did not produce a flood of error messages or crash immediately?
Reading through the sources again with this particular scenario in mind yielded
a possible answer: The problem is so unlikely to occur with our specific
workload that it simply never occurred so far.
Our most thorough test that swaps two buffers repeatedly only works with the
*blocking* API, which is unaffected by this problem. Although the callback API
is used heavily throughout our hash table test/benchmark combination, the way
the hash table keys are accessed (in a rather unpredictable fashion from
*preemptible* kernel threads) probably makes the race so unlikely that it has
*never* been revealed yet.
Furthermore, the reclaimers run under a very high priority. (They may be
preempted from time to time due to their SYSDC scheduling class, but only
extremely long callback queues could make this happen.) This fact *hides* the
bug even deeper, making a scenario where a reclaimer is signaled by the
detector *and* does *not* wake up almost immediately (providing room for
callback B from your pseudocode to be enqueued and then moved to cur_cbs
prematurely) highly improbable.
The thoughts above might (hopefully) explain why we haven't seen the
pathological scenario so far.
Consequently, unless someone comes up with serious reasoning about the
contrary, this problem will have to be fixed somehow:
A) By swapping the waiting and signaling phases of the detector, as you
suggested... But I have a strong impression that this might fail to exploit all
the possible parallelism among the reclaimers and the detector.
B) By using three lists of callbacks instead of two, with the obvious
intention of delaying the callbacks one GP boundary further and avoiding the
pathological scenario. This would be nothing new -- some RCU algorithms for
Linux had three (or even more) queues of callbacks for similar reasons. Such a
solution would make it possible to leave the blocking API implementation
unchanged and only make a small modification to the reclaimers.
It would be nice if a third person (Martin?) could have a careful look at this.
I have just got too confused to think about it any further.
Well done, Adam!
Andrej
smime.p7s
Description: Elektronicky podpis S/MIME
_______________________________________________ HelenOS-devel mailing list [email protected] http://lists.modry.cz/cgi-bin/listinfo/helenos-devel
