Hi Andrej,

Thank you very much for taking the time to look at my
comments regarding your algorithm.

I am afraid I still do not understand how the race can
be prevented. I am probably stuck and -- sorry to say
this -- you are most likely the only one who can get me
unstuck ;-). I would be grateful if you had a look at this
email as well and pointed out my logical error.


Hi Adam,

After checking your example much more carefully, including the source code, I 
have the impression that you are perfectly right. I would like to apologize for 
my statement of the form „if there was a bug of this magnitude, we would have 
already observed it“. The way I see it right now, there *is* a serious bug. 
(And the fact that it hasn't been observed yet doesn't matter at all.)

I remember I got stuck by similar thoughts quite many times in the past, but 
unfortunately, two years have elapsed and many thoughts (and also notes, 
unfortunately) are now irreversibly lost. :-( Nevertheless, right now I am 
convinced that the race condition you point out („a callback enqueued *after* 
GP detection and *before* the subsequent reclamation can be invoked prematurely 
with respect to a concurrent reader“) *can* really occur.

This leads to an obvious question: How is it possible that our tests and 
benchmarks did not produce a flood of error messages or crash immediately? 
Reading through the sources again with this particular scenario in mind yielded 
a possible answer: The problem is so unlikely to occur with our specific 
workload that it simply never occurred so far.

Our most thorough test that swaps two buffers repeatedly only works with the 
*blocking* API, which is unaffected by this problem. Although the callback API 
is used heavily throughout our hash table test/benchmark combination, the way 
the hash table keys are accessed (in a rather unpredictable fashion from 
*preemptible* kernel threads) probably makes the race so unlikely that it has 
*never* been revealed yet.

Furthermore, the reclaimers run under a very high priority. (They may be 
preempted from time to time due to their SYSDC scheduling class, but only 
extremely long callback queues could make this happen.) This fact *hides* the 
bug even deeper, making a scenario where a reclaimer is signaled by the 
detector *and* does *not* wake up almost immediately (providing room for 
callback B from your pseudocode to be enqueued and then moved to cur_cbs 
prematurely) highly improbable.

The thoughts above might (hopefully) explain why we haven't seen the 
pathological scenario so far.

Consequently, unless someone comes up with serious reasoning about the 
contrary, this problem will have to be fixed somehow:

        A) By swapping the waiting and signaling phases of the detector, as you 
suggested... But I have a strong impression that this might fail to exploit all 
the possible parallelism among the reclaimers and the detector.

        B) By using three lists of callbacks instead of two, with the obvious 
intention of delaying the callbacks one GP boundary further and avoiding the 
pathological scenario. This would be nothing new -- some RCU algorithms for 
Linux had three (or even more) queues of callbacks for similar reasons. Such a 
solution would make it possible to leave the blocking API implementation 
unchanged and only make a small modification to the reclaimers.

It would be nice if a third person (Martin?) could have a careful look at this. 
I have just got too confused to think about it any further.

Well done, Adam!

Andrej

Attachment: smime.p7s
Description: Elektronicky podpis S/MIME

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Reply via email to