On Mon, Jun 18, 2012 at 6:22 PM, Jakub Jermar <[email protected]> wrote: > Hi Adam, > > On 06/15/2012 11:44 PM, Adam Hraska wrote: >> Visibility vs MBs >> ----------------- >> I base the following discussion on [18, 19, 20]. >> >> In order for a store on one cpu (CPU_1) to become >> visible to a load on another cpu (CPU_2): >> 1) CPU_1 must execute a MB after the store (wrt >> instruction order). >> 2) After CPU_1's MB completes (wrt cache-coherency >> bus traffic) CPU_2 must execute a MB. Then (wrt >> instruction order) CPU_2 can issue a load which >> is guaranteed to see the stored value. >> >> If either of the MBs are omitted, CPU_2 may never >> load the stored value (even if it loads it in a loop). >> In practice, CPU_2 will eventually see the new value. >> Due to having a store buffer and an invalidate queue >> limited in size, a cpu in the system would eventually >> have to stall if stores were to be hidden indefinitely >> [18]. >> >> What I would like to know is how long it takes for >> CPU_2 to first see CPU_1's store if CPU_2 omits its >> MB, but CPU_1 does not. CPU_2's cache may be busy, so >> its invalidate queue may not get to be processed [18]. >> However, CPU_2's performance is not affected. It >> can continue working with its cache without worrying >> about the inv. queue (that's why the queue is not >> been processed by the cache in the first place - the >> cpu is busy using its cache). Therefore, CPU_2 has no >> motivation to process the queue. Even if the queue >> becomes full, at worst some other cpu will stall but >> not CPU_2. > > Interestingly, there are such examples in [18]: > > a = 0 > b = 0 > > CPU_1: > a = 1; > mb(); > b = 1; > > CPU_2: > while (b == 0) continue; > mb(); > assert(a == 1); > > The two mb()'s are essential so that the assertion is not hit, but note > that there is no mb() in front of the while cycle, which essentially > monitors a change of state of a shared variable without issuing either > barrier.
Ahh, good catch, Jakub! I have not noticed it before. > Therefore it seems to me like the barriers are only useful to ensure > that all processors observe the same ordering of two or more memory > operations. > > From [18] and your logic, it would seem that CPU_2 could theoretically > indefinitely loop. I'd tend to think that on any reasonable CPU > architecture this would be either impossible (by forcing processing of > the invalidation queues every now and then) or highly unlikely > (comparable to the potentially unbounded looping on a spinlock). I agree. I decided not to worry about cache coherency (and its latency). > Btw, [18] provides some interesting read, thanks for sharing the link! My pleasure :-). Adam _______________________________________________ HelenOS-devel mailing list [email protected] http://lists.modry.cz/cgi-bin/listinfo/helenos-devel
