On Tue, May 01, 2012 at 05:02:06PM -0400, Mathieu Desnoyers wrote: > * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: > > On Tue, May 01, 2012 at 01:41:36PM -0400, Mathieu Desnoyers wrote: > > > * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: > > > > On Tue, May 01, 2012 at 11:21:44AM -0400, Mathieu Desnoyers wrote: > > > > > * Mathieu Desnoyers (mathieu.desnoy...@efficios.com) wrote: > > > > > > * Paul E. McKenney (paul...@linux.vnet.ibm.com) wrote: > > > > > > > On Tue, May 01, 2012 at 10:16:09AM -0400, Mathieu Desnoyers wrote: > > > > > > > > Hi! > > > > > > > > > > > > > > > > After 1 year of development, with the last 6-7 months spent > > > > > > > > polishing > > > > > > > > the API and testing the implementation, I think it is getting > > > > > > > > about time > > > > > > > > to release the RCU lock-free hash table in a new Userspace RCU > > > > > > > > version > > > > > > > > (0.7). > > > > > > > > > > > > > > > > I recently described the guarantees provided by the hash table > > > > > > > > in more > > > > > > > > detail, and created tests for the uniqueness guarantee for > > > > > > > > traversals > > > > > > > > performed concurrently with add_unique and add_replace > > > > > > > > operations. I > > > > > > > > also added test modes that create long hash chains, to test > > > > > > > > corner-cases > > > > > > > > of the hash table. > > > > > > > > > > > > > > > > One small thing I wonder is whether we should document that the > > > > > > > > hash > > > > > > > > table update operations imply full memory barriers ? > > > > > > > > > > > > > > The Linux kernel's rule seems good here -- if a hash-table > > > > > > > operation is > > > > > > > atomic and returns a value, it should imply a full barrier. So: > > > > > > > > > > > > > > cds_lfht_new(): No point in claiming barriers -- publishing the > > > > > > > pointer to the hash table is where the barriers are important. > > > > > > > > > > > > > > cds_lfht_destroy(): Ditto. > > > > > > > > > > > > > > cds_lfht_lookup(): Not an update (let alone an atomic update), no > > > > > > > barriers. > > > > > > > > > > > > > > cds_lfht_next_duplicate(): Ditto. > > > > > > > > > > > > > > cds_lfht_first(): Ditto. > > > > > > > > > > > > > > cds_lfht_next(): Ditto. > > > > > > > > > > > > > > cds_lfht_add(): Atomic update, but no return value, so no barrier > > > > > > > implied. > > > > > > > > > > > > Yep, makes sense. We use cmpxchg internally to perform the update, > > > > > > but > > > > > > it could make sense to eventually use a cmpxchg that has no memory > > > > > > barriers to perform this update. So I agree on not providing a > > > > > > memory > > > > > > barrier guarantee on the "add" operation, since it does not return > > > > > > any > > > > > > value. > > > > > > > > > > > > > > > > > > > > cds_lfht_add_unique(): Atomic update that returns a value, so > > > > > > > should > > > > > > > imply a full memory barrier. > > > > > > > > > > > > add_unique is a bit special: > > > > > > > > > > > > - if it returns the node received as parameter, it means the add > > > > > > succeeded, which imply an update, and thus a memory barrier. > > > > > > > > > > Hrm, thinking further: if we make the "add" operation not act as a > > > > > full memory barrier, then the add_unique success should not act as a > > > > > full mb neither. > > > > > > > > Think of it as being similar to the Linux kernel's atomic_inc() and > > > > atomic_add_return() primitives. The latter guarantees memory barriers > > > > and the former does not. > > > > > > I think add/add_unique vs atomic_inc/atomic_add_return are fundamentally > > > different. > > > > > > atomic_inc: > > > - input: increment > > > - output: none > > > - effect: increment target address value > > > > > > atomic_add_return: > > > - input: increment > > > - output: incremented value > > > - effect: increment target address value, atomically reading the > > > resulting value. > > > > > > hash table add: > > > - input: new node to add > > > - output: none > > > - effect: atomically add the new node into the table > > > > > > hash table add_unique (success): > > > - input: new node to add > > > - output: (we just return whether the operation has succeeded) > > > - effect: atomically add the new node into the table > > > > > > hash table add_unique (failure): > > > - input: new node to try adding > > > - output: (we just return whether the operation has succeeded) > > > - effect: simple lookup (read) > > > > > > So as we see, the add_unique failure only performs a read. Adding a > > > memory barrier before this read would require us to add a memory barrier > > > also on the success path, which would degrade performance. The success > > > path does: lookup failure, cmpxchg to add the node, retry if changed. > > > Adding a memory barrier before the lookup would add an extra memory > > > barrier in addition to the one located in the cmpxchg, and I don't think > > > we want that overhead. > > > > Perhaps a better analogy is cmpxchg() in the Linux kernel. Some > > architectures place the memory barrier before unconditionally, but place > > a memory barrier after only if the cmpxchg() succeeds. Of course, > > architectures for which the cmpxchg instruction implies a barrier > > don't need any explicit memory-barrier instructions. > > > > I do see your point about the memory barrier in the failure case requiring > > an additional memory barrier in the success case, however. > > > > > > > > - if it returns a different node than the one received as > > > > > > parameter, it > > > > > > failed, and thus means that it only performed a lookup, so there > > > > > > is no > > > > > > guarantee that a memory barrier has been executed. > > > > > > > > > > > > > > > > > > > > > > > > > > cds_lfht_add_replace(): Ditto. > > > > > > > > > > > > > > cds_lfht_replace(): Ditto. > > > > > > > > > > > > > > cds_lfht_del(): Ditto. > > > > > > > > > > One more point: "del" is similar to add_unique: if it succeeds, we > > > > > execute a full memory barrier. If it fails, no memory barrier is > > > > > guaranteed. But if we want to make the guarantees relax, we might not > > > > > want to guarantee that a memory barrier is present in any of the "del" > > > > > cases. > > > > > > > > > > In the end, the only primitives for which I think it really makes > > > > > sense > > > > > to provide memory barrier semantic is the add_replace and replace : > > > > > they > > > > > actually act as an higher-level "cmpxchg" over the hash table nodes. > > > > > > > > I believe that this should do the same -- memory barrier before, but no > > > > memory barrier after on failure. > > > > > > > > Another approach is C++11, in which there are a couple of arguments to > > > > the compare-and-swap primitive specifying the memory ordering > > > > constraints > > > > for the success and failure cases, respectively. > > > > > > Hrm, I think adding this kind of flexibility might make the API too > > > clobbered by details that "usual cases" don't care about. > > > > ;-) > > > > > > Unless you expect use cases with lots of failing cds_lfht_del() and > > > > add_unique() calls, the performance difference should not be > > > > significant. > > > > > > The problem is that I expect very, very frequent failing add_unique > > > calls for use-cases like Binary Decision Diagrams (BDD) creation. This > > > is why I don't think it is appropriate to put memory barriers on the > > > failure cases, as these should have no more overhead than a simple > > > lookup. > > > > > > And if we choose not to provide memory barriers for add_unique failure, > > > we should probably do the same for "del" to keep a consistent behavior > > > over the API. > > > > > > Thoughts ? > > > > Given your BDD example, letting the failure cases avoid memory barriers > > seems reasonable. If problems arise, the default interfaces can be > > fully ordered with faster versions where the failure cases avoid memory > > barriers. > > I think we agree on the failure cases of del and add_unique, which is > good. > > Now, I'm thinking slightly further about the success cases of add, > add_unique and del. > > Typically, when giving a node to add or add_unique for population into > the hash table, we have to order the writes that initialize the node > before its insertion into the hash table. So I think it would make sense > to guarantee that we provide a memory barrier prior to add and > add_unique (success) operations. > > For the "del" case, it's the other way around: if a deletion succeeds, > we want to have a memory barrier between the removal flag write and the > following "free()" of the element. So we need to ensure that there is a > memory barrier after a successful deletion. > > If we think about it, the "add_replace" acts both as an "add" and "del": > in all cases, we need to have a memory barrier before the add_replace, > because we need to order the node initialization before the atomic > insertion into the hash table, and we also need a memory barrier after, > but only really in the case where add_replace replaced an old node, thus > acting as a delete of that old node. > > Considering all this, the barrier semantic I'm proposing is: > > add: > - full memory barrier before the insertion. > > add_unique: > - success: full memory barrier before the insertion. > - failure: no memory barrier. > > replace: > - success: full memory barrier before the insertion, and after the > removal of the old node. > - failure: no memory barrier. > > add_replace: > - returns an old node (replace): full memory barrier before the > insertion, and after the removal of the old node. > - returns NULL (add): full memory barrier before the insertion. > > del: > - success: full memory barrier after the removal. > - failure: no memory barrier. > > I think those are the minimal barriers we need to provide to ensure that > users will not have to add barriers of their own to use the API. > > Thoughts ?
This means that some other CPU might see an add followed by a del as happening in the opposite order. Is that really OK? Thanx, Paul > Thanks, > > Mathieu > > > > > > Thanx, Paul > > > > > Thanks, > > > > > > Mathieu > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > Thoughts ? > > > > > > > > > > Thanks, > > > > > > > > > > Mathieu > > > > > > > > > > > > > > > > > Yep, makes sense. > > > > > > > > > > > > I'll add this documentation in the API. > > > > > > > > > > > > Thanks! > > > > > > > > > > > > Mathieu > > > > > > > > > > > > > > > > > > > > cds_lfht_is_node_deleted(): Not an update (let alone an atomic > > > > > > > update), > > > > > > > no barriers. > > > > > > > > > > > > > > cds_lfht_resize(): Atomic update, but no return value, so no > > > > > > > barrier > > > > > > > implied. > > > > > > > > > > > > > > cds_lfht_for_each(): Not an update (let alone an atomic update), > > > > > > > no barriers. > > > > > > > > > > > > > > cds_lfht_for_each_duplicate(): Ditto. > > > > > > > > > > > > > > cds_lfht_for_each_entry(): Ditto. > > > > > > > > > > > > > > cds_lfht_for_each_entry_duplicate(): Ditto. > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > > > > I'm personally getting confident that the hash table API is > > > > > > > > clean > > > > > > > > enough, and the implementation well tested, but I'd like to > > > > > > > > have your > > > > > > > > thoughts on the readiness of the hash table for production use. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Mathieu > > > > > > > > > > > > > > > > -- > > > > > > > > Mathieu Desnoyers > > > > > > > > Operating System Efficiency R&D Consultant > > > > > > > > EfficiOS Inc. > > > > > > > > http://www.efficios.com > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > rp mailing list > > > > > > > > r...@svcs.cs.pdx.edu > > > > > > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > lttng-dev mailing list > > > > > > > lttng-dev@lists.lttng.org > > > > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > > > > > > > > > -- > > > > > > Mathieu Desnoyers > > > > > > Operating System Efficiency R&D Consultant > > > > > > EfficiOS Inc. > > > > > > http://www.efficios.com > > > > > > > > > > > > _______________________________________________ > > > > > > lttng-dev mailing list > > > > > > lttng-dev@lists.lttng.org > > > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > > > > > > > -- > > > > > Mathieu Desnoyers > > > > > Operating System Efficiency R&D Consultant > > > > > EfficiOS Inc. > > > > > http://www.efficios.com > > > > > > > > > > _______________________________________________ > > > > > rp mailing list > > > > > r...@svcs.cs.pdx.edu > > > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp > > > > > > > > > > > > > > > -- > > > Mathieu Desnoyers > > > Operating System Efficiency R&D Consultant > > > EfficiOS Inc. > > > http://www.efficios.com > > > > > > _______________________________________________ > > > rp mailing list > > > r...@svcs.cs.pdx.edu > > > http://svcs.cs.pdx.edu/mailman/listinfo/rp > > > > > > > > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev@lists.lttng.org > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > -- > Mathieu Desnoyers > Operating System Efficiency R&D Consultant > EfficiOS Inc. > http://www.efficios.com > _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev