On Sat, May 05, 2012 at 02:22:12PM -0400, Mathieu Desnoyers wrote: > * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: > > On Fri, May 04, 2012 at 12:53:12PM -0400, Mathieu Desnoyers wrote: > > > * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: > > > > On Thu, May 03, 2012 at 01:13:30PM -0400, Mathieu Desnoyers wrote: > > > > > * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: > > > [...] > > > > > > > > > > > > A write barrier would be sufficient in the case where there were > > > > > > only > > > > > > two threads observing each other. A full memory barrier would be > > > > > > needed > > > > > > to prevent the assertion from firing in this sort of case (not sure > > > > > > that > > > > > > this is exactly right, but something like this): > > > > > > > > > > > > Initial contents: B, C > > > > > > > > > > > > T0: add A; del B > > > > > > T1: if (!lookup B) { add B; del C } > > > > > > T2: r1 = lookup C; smp_mb(); r2 = lookup A > > > > > > > > > > > > assert(lookup C || lookup A); > > > > > > > > > > What you are bringing here as counter-example is, I think, > > > > > transitivity. > > > > > > > > Yep! > > > > > > > > > I'm trying to figure out how your example could fail, and I cannot see > > > > > how. Follows a detail of the closest scenario I get to failing is the > > > > > following, but it does not fail. After that, I'm proposing a different > > > > > scenario, which I think will be more appropriate for the current > > > > > topic. > > > > > > > > > > Attempted detail of your scenario: > > > > > > > > > > T0 T1 T2 > > > > > > > > > > add A > > > > > wmb > > > > > (add A globally > > > > > observable) > > > > > > > > Almost. All that this really guarantees is that if someone sees > > > > the "add B", they will also see the "add A". > > > > > > > > > del B > > > > > (del B globally > > > > > observable) > > > > > (add A NOT brought > > > > > into cache) > > > > > (del B brought into > > > > > cache) > > > > > read B > > > > > (test false) > > > > > add B > > > > > wmb > > > > > (add B globally > > > > > observable) > > > > > del C > > > > > (del C globally > > > > > observable) > > > > > > > > Here, if someone sees the "del C", they will see the "add B", and > > > > they also will have lost the opportunity to modify B before T1 > > > > reads from it and modifies it. > > > > > > > > > (add A NOT brought > > > > > into cache) > > > > > (del C brought into > > > > > cache) > > > > > read C -> not there. > > > > > mb > > > > > (add A brought > > > > > into cache) > > > > > read A -> there -> success. > > > > > > > > So we see that C is not there. We know that B would be there if > > > > we looked at it. But we don't look at B, we look at A. But the > > > > ordering back to T0's "add A" requires transitivity, which wmb > > > > does not guarantee. > > > > > > OK, got it! > > > > > > > > > > > > If I look at the "transitivity" section in Linux memory-barriers.txt, > > > > > I > > > > > notice that the example is mainly around using read barrier around > > > > > loads > > > > > rather than general barrier. Let's see if I can modify that example to > > > > > come up with an example error case: > > > > > > > > > > Initial content: empty > > > > > > > > > > T0: add X > > > > > T1: r1 = lookup X; smp_mb; r2 = lookup Y > > > > > T2: add Y; r3 = lookup X > > > > > > > > > > assert( !(r1 && !r2 && !r3) ) > > > > > > > > > > The key thing here is that if the barrier in T2 after "add Y" is a > > > > > smp_wmb rather than a smp_mb, this could allow the "r3 = lookup X" to > > > > > be > > > > > reordered before add Y, thus allowing the assertion to fail. > > > > > > > > Your example is simpler, and demonstrates the need just as well, so > > > > let's go with your example. > > > > > > > > > I think it would be more intuitive for users if lookups vs updates > > > > > performed on the same thread are ordered with full memory barriers. > > > > > Given that we don't want to add extra barriers in read operations, it > > > > > would make sense to guarantee full memory barriers before and after > > > > > updates. > > > > > > > > > > So how about we use full memory barriers before and after each of: > > > > > add, > > > > > del (success), add_unique (success), replace, and add_replace ? If we > > > > > ever want to relax those ordering guarantees, then we can always add > > > > > new > > > > > update APIs with a "weaker" ordering. > > > > > > > > > > Thoughts ? > > > > > > > > That line of reasoning makes a lot of sense to me! > > > > > > Sounds good. > > > > > > Here is what I propose, thoughts ? > > > > Just to make sure I understand -- the reason that the "del" functions > > say "no memory barrier" instead of "acts like rcu_dereference()" is > > that the "del" functions don't return anything. > > Hrm, you bring an interesting point. I think we should change the two > "rcu_dereference()" in _cds_lfht_del for "CMM_LOAD_SHARED()". The > difference between the two is that CMM_LOAD_SHARED() does not imply a > read barrier between the read and following uses of the data pointed to > by the pointer read. > > Same thing for "cds_lfht_is_node_deleted": the rcu_dereference() should > be changed for a CMM_LOAD_SHARED(), because we never use the loaded > pointer as a pointer to other data. There are a few other locations > where the pointer is only used for its flags. > > Here is what I propose: > > diff --git a/rculfhash.c b/rculfhash.c > index b9f795f..6e27436 100644 > --- a/rculfhash.c > +++ b/rculfhash.c > @@ -923,7 +923,7 @@ int _cds_lfht_replace(struct cds_lfht *ht, unsigned long > size, > bucket = lookup_bucket(ht, size, > bit_reverse_ulong(old_node->reverse_hash)); > _cds_lfht_gc_bucket(bucket, new_node); > > - assert(is_removed(rcu_dereference(old_node->next))); > + assert(is_removed(CMM_LOAD_SHARED(old_node->next)));
This is good. > return 0; > } > > @@ -1061,7 +1061,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long > size, > * logical removal flag). Return -ENOENT if the node had > * previously been removed. > */ > - next = rcu_dereference(node->next); > + next = CMM_LOAD_SHARED(node->next); > if (caa_unlikely(is_removed(next))) > return -ENOENT; > assert(!is_bucket(next)); As long as "next" is not dereferenced anywhere, this is good. Which appears to be the case. > @@ -1082,7 +1082,7 @@ int _cds_lfht_del(struct cds_lfht *ht, unsigned long > size, > bucket = lookup_bucket(ht, size, bit_reverse_ulong(node->reverse_hash)); > _cds_lfht_gc_bucket(bucket, node); > > - assert(is_removed(rcu_dereference(node->next))); > + assert(is_removed(CMM_LOAD_SHARED(node->next))); This one is fine as well. > /* > * Last phase: atomically exchange node->next with a version > * having "REMOVAL_OWNER_FLAG" set. If the returned node->next > @@ -1510,7 +1510,7 @@ void cds_lfht_lookup(struct cds_lfht *ht, unsigned long > hash, > } > node = clear_flag(next); > } > - assert(!node || !is_bucket(rcu_dereference(node->next))); > + assert(!node || !is_bucket(CMM_LOAD_SHARED(node->next))); As is this one. > iter->node = node; > iter->next = next; > } > @@ -1543,7 +1543,7 @@ void cds_lfht_next_duplicate(struct cds_lfht *ht, > cds_lfht_match_fct match, > } > node = clear_flag(next); > } > - assert(!node || !is_bucket(rcu_dereference(node->next))); > + assert(!node || !is_bucket(CMM_LOAD_SHARED(node->next))); Same here. > iter->node = node; > iter->next = next; > } > @@ -1565,7 +1565,7 @@ void cds_lfht_next(struct cds_lfht *ht, struct > cds_lfht_iter *iter) > } > node = clear_flag(next); > } > - assert(!node || !is_bucket(rcu_dereference(node->next))); > + assert(!node || !is_bucket(CMM_LOAD_SHARED(node->next))); And here. > iter->node = node; > iter->next = next; > } > @@ -1668,7 +1668,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct > cds_lfht_node *node) > > int cds_lfht_is_node_deleted(struct cds_lfht_node *node) > { > - return is_removed(rcu_dereference(node->next)); > + return is_removed(CMM_LOAD_SHARED(node->next)); And also here. > } > > static > > If it's ok for you, I will first commit this change, and then commit the > memory barrier documentation with your reviewed-by. Looks good! Thanx, Paul > Thanks! > > Mathieu > > > > > Assuming my understanding is correct: > > > > Reviewed-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> > > > > Thanx, Paul > > > > > diff --git a/urcu/rculfhash.h b/urcu/rculfhash.h > > > index 2d8a310..2938e5e 100644 > > > --- a/urcu/rculfhash.h > > > +++ b/urcu/rculfhash.h > > > @@ -203,6 +203,7 @@ void cds_lfht_count_nodes(struct cds_lfht *ht, > > > * > > > * Call with rcu_read_lock held. > > > * Threads calling this API need to be registered RCU read-side threads. > > > + * This function acts as a rcu_dereference() to read the node pointer. > > > */ > > > void cds_lfht_lookup(struct cds_lfht *ht, unsigned long hash, > > > cds_lfht_match_fct match, const void *key, > > > @@ -226,6 +227,7 @@ void cds_lfht_lookup(struct cds_lfht *ht, unsigned > > > long hash, > > > * node returned by a previous cds_lfht_next. > > > * Call with rcu_read_lock held. > > > * Threads calling this API need to be registered RCU read-side threads. > > > + * This function acts as a rcu_dereference() to read the node pointer. > > > */ > > > void cds_lfht_next_duplicate(struct cds_lfht *ht, > > > cds_lfht_match_fct match, const void *key, > > > @@ -239,6 +241,7 @@ void cds_lfht_next_duplicate(struct cds_lfht *ht, > > > * Output in "*iter". *iter->node set to NULL if table is empty. > > > * Call with rcu_read_lock held. > > > * Threads calling this API need to be registered RCU read-side threads. > > > + * This function acts as a rcu_dereference() to read the node pointer. > > > */ > > > void cds_lfht_first(struct cds_lfht *ht, struct cds_lfht_iter *iter); > > > > > > @@ -252,6 +255,7 @@ void cds_lfht_first(struct cds_lfht *ht, struct > > > cds_lfht_iter *iter); > > > * pointing to the last table node. > > > * Call with rcu_read_lock held. > > > * Threads calling this API need to be registered RCU read-side threads. > > > + * This function acts as a rcu_dereference() to read the node pointer. > > > */ > > > void cds_lfht_next(struct cds_lfht *ht, struct cds_lfht_iter *iter); > > > > > > @@ -264,6 +268,8 @@ void cds_lfht_next(struct cds_lfht *ht, struct > > > cds_lfht_iter *iter); > > > * This function supports adding redundant keys into the table. > > > * Call with rcu_read_lock held. > > > * Threads calling this API need to be registered RCU read-side threads. > > > + * This function issues a full memory barrier before and after its > > > + * atomic commit. > > > */ > > > void cds_lfht_add(struct cds_lfht *ht, unsigned long hash, > > > struct cds_lfht_node *node); > > > @@ -288,6 +294,12 @@ void cds_lfht_add(struct cds_lfht *ht, unsigned long > > > hash, > > > * to add keys into the table, no duplicated keys should ever be > > > * observable in the table. The same guarantee apply for combination of > > > * add_unique and add_replace (see below). > > > + * > > > + * Upon success, this function issues a full memory barrier before and > > > + * after its atomic commit. Upon failure, this function acts like a > > > + * simple lookup operation: it acts as a rcu_dereference() to read the > > > + * node pointer. The failure case does not guarantee any other memory > > > + * barrier. > > > */ > > > struct cds_lfht_node *cds_lfht_add_unique(struct cds_lfht *ht, > > > unsigned long hash, > > > @@ -321,6 +333,9 @@ struct cds_lfht_node *cds_lfht_add_unique(struct > > > cds_lfht *ht, > > > * schemes will never generate duplicated keys. It also allows us to > > > * guarantee that a combination of add_replace and add_unique updates > > > * will never generate duplicated keys. > > > + * > > > + * This function issues a full memory barrier before and after its > > > + * atomic commit. > > > */ > > > struct cds_lfht_node *cds_lfht_add_replace(struct cds_lfht *ht, > > > unsigned long hash, > > > @@ -352,6 +367,10 @@ struct cds_lfht_node *cds_lfht_add_replace(struct > > > cds_lfht *ht, > > > * > > > * The semantic of replacement vs lookups is the same as > > > * cds_lfht_add_replace(). > > > + * > > > + * Upon success, this function issues a full memory barrier before and > > > + * after its atomic commit. Upon failure, this function does not issue > > > + * any memory barrier. > > > */ > > > int cds_lfht_replace(struct cds_lfht *ht, > > > struct cds_lfht_iter *old_iter, > > > @@ -377,6 +396,9 @@ int cds_lfht_replace(struct cds_lfht *ht, > > > * After successful removal, a grace period must be waited for before > > > * freeing the memory reserved for old node (which can be accessed with > > > * cds_lfht_iter_get_node). > > > + * Upon success, this function issues a full memory barrier before and > > > + * after its atomic commit. Upon failure, this function does not issue > > > + * any memory barrier. > > > */ > > > int cds_lfht_del(struct cds_lfht *ht, struct cds_lfht_node *node); > > > > > > @@ -391,6 +413,7 @@ int cds_lfht_del(struct cds_lfht *ht, struct > > > cds_lfht_node *node); > > > * function. > > > * Call with rcu_read_lock held. > > > * Threads calling this API need to be registered RCU read-side threads. > > > + * This function does not issue any memory barrier. > > > */ > > > int cds_lfht_is_node_deleted(struct cds_lfht_node *node); > > > > > > @@ -400,6 +423,7 @@ int cds_lfht_is_node_deleted(struct cds_lfht_node > > > *node); > > > * @new_size: update to this hash table size. > > > * > > > * Threads calling this API need to be registered RCU read-side threads. > > > + * This function does not (necessarily) issue memory barriers. > > > */ > > > void cds_lfht_resize(struct cds_lfht *ht, unsigned long new_size); > > > > > > @@ -407,6 +431,7 @@ void cds_lfht_resize(struct cds_lfht *ht, unsigned > > > long new_size); > > > * Note: it is safe to perform element removal (del), replacement, or > > > * any hash table update operation during any of the following hash > > > * table traversals. > > > + * These functions act as rcu_dereference() to read the node pointers. > > > */ > > > #define cds_lfht_for_each(ht, iter, node) > > > \ > > > for (cds_lfht_first(ht, iter), \ > > > > > > Thanks, > > > > > > Mathieu > > > > > > -- > > > Mathieu Desnoyers > > > Operating System Efficiency R&D Consultant > > > EfficiOS Inc. > > > http://www.efficios.com > > > > > > _______________________________________________ > > > rp mailing list > > > r...@svcs.cs.pdx.edu > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp > > > > > > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev@lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev