Re: [HelenOS-devel] Fwd: Re: RCU algorithm review

Adam Hraska Mon, 18 Jun 2012 05:11:52 -0700

On Mon, Jun 18, 2012 at 12:33 AM, Adam Hraska <[email protected]> wrote:
> On Sun, Jun 17, 2012 at 1:22 PM, Andrej Podzimek <[email protected]> 
> wrote:
>> To sum up, maybe I am missing something, but I really cannot see a race
>> condition here.
>>
>> If there was a race condition of this magnitude, the hash table benchmarks
>> (combined with thorough tests using bit patterns) would probably crash after
>> milliseconds. However, they have neither crashed nor observed any
>> inconsistent data modifications so far. (Admittedly, they have only run for
>> about two weeks on all the machines combined so far...)
>
> ;-). You can increase the likelyhood of demonstrating
> the race if you test with hiwater mark set (so that
> the detector does not sleep). Moreover, implementation
> details of condition variables on OpenSolaris may influence
> how often you encounter the race in practice.
>
> If you want to see the race in action you could insert
> a rcu_call into rcu_advance_callbacks() (perfectly valid
> due to enabled preemption and interrupts). The following
> should work even on a uniprocessor and *without* a high
> water mark. Something in the spirit of:
>
> // Wait for the detector and reclaimers to reach
> // initial position.
> demonstrate_race = 0
> sleep(1 second)
>
> // Single normal rcu_call during the whole test.
> demonstrate_race = 1
> rcu_call(A)
>
> rcu_reclaimer() {
>        // ..
>        if(1 == demonstrate_race) {
>                rcu_read_lock();


Oops, rcu_read_lock() disables preemption, so it would have
to be invoked in another thread on another cpu. As a result,
the race cannot be demonstrated on a uniprocessor with this
test.

>                // Add B() at the right time.
>                rcu_call(B)
>                ASSERT(nextlist == {A, B})
>                // Don't lock or add B next time.
>                demonstrate_race = 2
>        }
>        rcu_advance_callbacks
>        // ..
> }
>
> B() {
>        panic("Whoops, the race is real")
> }
>
> If there is no race, the test will deadlock. Otherwise,
> it will panic.
>
>
>>> If I am not mistaken, the implementation [6] checks
>>> for threads being idle or in user space with
>>> CPU->cpu_mstate in OpenSolaris. However, OpenSolaris
>>> does not surround changes of cpu_mstate with MBs (for
>>> performance reasons [7]); therefore, reading its value
>>> from the detector is racy (eg if a CS immediatelly
>>> follows the change).
>>
>>
>> It might seem prone to races with one CS immediately following an idle /
>> userspace state. However, this is why the GP boundaries are announced rather
>> than GP "endings". The race condition you mention will never occur twice in
>> a sequence. Yet RCU callbacks (as well as external threads asking for GP
>> detection) are always delayed until at least *two* GP boundaries are
>> observed.
>
> I do not think this race is related to the number of
> GP boundaries the detector waits for. The problem is
> that you may exclude a cpu from checking its rcu_gp_ctr
> because its mstate has not yet even propagated to the
> detector thread. Therefore, the detector does not even
> check if a QS was announced or not although there may
> be readers running after mstate's change. Note that
> OpenSolaris assumes that SPARC's writes are ordered,
> so it is not going to be that simple to reproduce this
> race.

Btw, why is running in user space considered a quiescent
state? Do transitions from/to user mode contain memory
barriers on OpenSolaris?

Thank you for your time, Andrej.

Adam

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Re: [HelenOS-devel] Fwd: Re: RCU algorithm review

Reply via email to