On Mon, Jun 18, 2012 at 12:33 AM, Adam Hraska <[email protected]> wrote:
> On Sun, Jun 17, 2012 at 1:22 PM, Andrej Podzimek <[email protected]>
> wrote:
>> To sum up, maybe I am missing something, but I really cannot see a race
>> condition here.
>>
>> If there was a race condition of this magnitude, the hash table benchmarks
>> (combined with thorough tests using bit patterns) would probably crash after
>> milliseconds. However, they have neither crashed nor observed any
>> inconsistent data modifications so far. (Admittedly, they have only run for
>> about two weeks on all the machines combined so far...)
>
> ;-). You can increase the likelyhood of demonstrating
> the race if you test with hiwater mark set (so that
> the detector does not sleep). Moreover, implementation
> details of condition variables on OpenSolaris may influence
> how often you encounter the race in practice.
>
> If you want to see the race in action you could insert
> a rcu_call into rcu_advance_callbacks() (perfectly valid
> due to enabled preemption and interrupts). The following
> should work even on a uniprocessor and *without* a high
> water mark. Something in the spirit of:
>
> // Wait for the detector and reclaimers to reach
> // initial position.
> demonstrate_race = 0
> sleep(1 second)
>
> // Single normal rcu_call during the whole test.
> demonstrate_race = 1
> rcu_call(A)
>
> rcu_reclaimer() {
> // ..
> if(1 == demonstrate_race) {
> rcu_read_lock();
Oops, rcu_read_lock() disables preemption, so it would have
to be invoked in another thread on another cpu. As a result,
the race cannot be demonstrated on a uniprocessor with this
test.
> // Add B() at the right time.
> rcu_call(B)
> ASSERT(nextlist == {A, B})
> // Don't lock or add B next time.
> demonstrate_race = 2
> }
> rcu_advance_callbacks
> // ..
> }
>
> B() {
> panic("Whoops, the race is real")
> }
>
> If there is no race, the test will deadlock. Otherwise,
> it will panic.
>
>
>>> If I am not mistaken, the implementation [6] checks
>>> for threads being idle or in user space with
>>> CPU->cpu_mstate in OpenSolaris. However, OpenSolaris
>>> does not surround changes of cpu_mstate with MBs (for
>>> performance reasons [7]); therefore, reading its value
>>> from the detector is racy (eg if a CS immediatelly
>>> follows the change).
>>
>>
>> It might seem prone to races with one CS immediately following an idle /
>> userspace state. However, this is why the GP boundaries are announced rather
>> than GP "endings". The race condition you mention will never occur twice in
>> a sequence. Yet RCU callbacks (as well as external threads asking for GP
>> detection) are always delayed until at least *two* GP boundaries are
>> observed.
>
> I do not think this race is related to the number of
> GP boundaries the detector waits for. The problem is
> that you may exclude a cpu from checking its rcu_gp_ctr
> because its mstate has not yet even propagated to the
> detector thread. Therefore, the detector does not even
> check if a QS was announced or not although there may
> be readers running after mstate's change. Note that
> OpenSolaris assumes that SPARC's writes are ordered,
> so it is not going to be that simple to reproduce this
> race.
Btw, why is running in user space considered a quiescent
state? Do transitions from/to user mode contain memory
barriers on OpenSolaris?
Thank you for your time, Andrej.
Adam
_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel