On Mon, Jun 18, 2012 at 6:22 PM, Jakub Jermar <[email protected]> wrote:
> Hi Adam,
>
> On 06/15/2012 11:44 PM, Adam Hraska wrote:
>> Visibility vs MBs
>> -----------------
>> I base the following discussion on [18, 19, 20].
>>
>> In order for a store on one cpu (CPU_1) to become
>> visible to a load on another cpu (CPU_2):
>> 1) CPU_1 must execute a MB after the store (wrt
>>       instruction order).
>> 2) After CPU_1's MB completes (wrt cache-coherency
>>       bus traffic) CPU_2 must execute a MB. Then (wrt
>>       instruction order) CPU_2 can issue a load which
>>       is guaranteed to see the stored value.
>>
>> If either of the MBs are omitted, CPU_2 may never
>> load the stored value (even if it loads it in a loop).
>> In practice, CPU_2 will eventually see the new value.
>> Due to having a store buffer and an invalidate queue
>> limited in size, a cpu in the system would eventually
>> have to stall if stores were to be hidden indefinitely
>> [18].
>>
>> What I would like to know is how long it takes for
>> CPU_2 to first see CPU_1's store if CPU_2 omits its
>> MB, but CPU_1 does not. CPU_2's cache may be busy, so
>> its invalidate queue may not get to be processed [18].
>> However, CPU_2's performance is not affected. It
>> can continue working with its cache without worrying
>> about the inv. queue (that's why the queue is not
>> been processed by the cache in the first place - the
>> cpu is busy using its cache). Therefore, CPU_2 has no
>> motivation to process the queue. Even if the queue
>> becomes full, at worst some other cpu will stall but
>> not CPU_2.
>
> Interestingly, there are such examples in [18]:
>
> a = 0
> b = 0
>
> CPU_1:
>  a = 1;
>  mb();
>  b = 1;
>
> CPU_2:
>  while (b == 0) continue;
>  mb();
>  assert(a == 1);
>
> The two mb()'s are essential so that the assertion is not hit, but note
> that there is no mb() in front of the while cycle, which essentially
> monitors a change of state of a shared variable without issuing either
> barrier.

Ahh, good catch, Jakub! I have not noticed it before.

> Therefore it seems to me like the barriers are only useful to ensure
> that all processors observe the same ordering of two or more memory
> operations.
>
> From [18] and your logic, it would seem that CPU_2 could theoretically
> indefinitely loop. I'd tend to think that on any reasonable CPU
> architecture this would be either impossible (by forcing processing of
> the invalidation queues every now and then) or highly unlikely
> (comparable to the potentially unbounded looping on a spinlock).

I agree. I decided not to worry about cache coherency (and its
latency).

> Btw, [18] provides some interesting read, thanks for sharing the link!

My pleasure :-).

Adam

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Reply via email to