On Tue, May 20, 2014 at 02:13:58PM -0700, Jarno Rajahalme wrote:
>
> On May 20, 2014, at 2:03 PM, Ben Pfaff <[email protected]> wrote:
>
> > On Tue, May 20, 2014 at 01:22:45PM -0700, Jarno Rajahalme wrote:
> >>
> >> On May 20, 2014, at 9:48 AM, Ben Pfaff <[email protected]> wrote:
> >>
> >>> On Tue, May 20, 2014 at 09:12:23AM -0700, Ben Pfaff wrote:
> >>>> On Fri, May 16, 2014 at 04:38:02PM -0700, Jarno Rajahalme wrote:
> >>>>>> + if (b->nodes[slot] == node) {
> >>>>>> + cmap_set_bucket(b, slot, cmap_node_next_protected(node),
> >>>>>> hash);
> >>>>>
> >>>>> ?hash? is not changing here, so could just set the nodes:
> >>>>>
> >>>>> b->nodes[slot] = cmap_node_next_protected(node);
> >>>>>
> >>>>> Btw, what is the rationale that the nodes pointers are not RCU
> >>>>> pointers? If they were, it would feel possible to combine this special
> >>>>> case with the loop below.
> >>>>
> >>>> Good points. I'll work on that for a v3.
> >>>
> >>> After thinking a little further, I am not sure that it would become
> >>> possible to combine them, because I think that the cases are a little
> >>> different:
> >>>
> >>> * If we are removing the last node with a hash, which is usually
> >>> the case if node == b->nodes[slot], then we want to make sure
> >>> that from the viewpoint of any reader that this is atomic
> >>> (that is, the change to ->hashes[] and ->nodes[]), by
> >>> incrementing the counter around the change. I am not
> >>> absolutely certain that this is required, but the cost is
> >>> minimal so, lacking confidence, I prefer to do it.
> >>
> >> My point was that the hash need not be changed, if the dup insertion
> >> code is also changed to not care about the node value (see the other
> >> comment I just sent).
> >
> > OK. Like this?
> >
> > diff --git a/lib/cmap.c b/lib/cmap.c
> > index 30a6e2d..d291ec5 100644
> > --- a/lib/cmap.c
> > +++ b/lib/cmap.c
> > @@ -648,7 +648,8 @@ cmap_remove__(struct cmap_impl *impl, struct cmap_node
> > *node,
> > }
> >
> > if (b->nodes[slot] == node) {
> > - cmap_set_bucket(b, slot, cmap_node_next_protected(node), hash);
> > + b->nodes[slot] = cmap_node_next_protected(node);
> > + atomic_thread_fence(memory_order_release);
>
> Yes. Would an ovsrcu_set() include the release barrier, if nodes[]
> were RCU pointers? (This relating to my earlier question about
Yes.
Maybe we really should make nodes[] into RCU pointers.
> > } else {
> > struct cmap_node *iter = b->nodes[slot];
> > for (;;) {
> >
> >>>
> >>> * Otherwise, we are shortening a linked list, but not
> >>> eliminating its slot, which does not affect readers in the
> >>> same way, so an ordinary RCU store should suffice.
> >>>
> >>> What I'm increasingly uncertain about is whether cmap_find() is correct.
> >>> The intention is that the atomic reads of the counters before and after
> >>> checking the nodes and the hashes should ensure that the cache lines
> >>> occupied by the buckets are stable. I think that's going to be true in
> >>> practice, with current compiler technology. But I am not sure that the
> >>> atomic_reads on the counters actually means that the node and hash reads
> >>> can't be moved outside the counter reads. If not, though, making the
> >>> node reads atomic (via RCU) and even the hash reads atomic (by making
> >>> them atomic_uint32s) wouldn't help. I think that the only thing that
> >>> would help would be adding explicit acquire and release barriers. That
> >>> might actually, in conjunction with good comments, be clearer than what
> >>> we have now.
> >>>
> >>> What do you think?
> >>
> >> atomic_read() implies memory_order_seq_cst, which is stronger than
> >> memory_order_acquire. An memory_order_acquire should guarantee that
> >> no memory operations after the atomic_read are moved before it, so
> >> we read the data only after reading an even counter. When we re-read
> >> the counter to verify it has not changed, an acquire barrier would
> >> guarantee no memory operations after the check are moved before it,
> >> but it would be possible for the memory operations before it to be
> >> moved after it. So the check needs a release barrier, even if we are
> >> only reading, to guarantee that memory operations before the check
> >> are not moved after it. The memory_order_seq_cst implied by
> >> atomic_read() does that, but is too strong, a memory_order_acq_rel
> >> should suffice, or even memory_order_acquire for the
> >> read_even_counter, and a memory_order_release for a
> >> ?check_counter()?. Makes sense?
> >
> > Yes. Like this?
> >
> > diff --git a/lib/cmap.c b/lib/cmap.c
> > index 30a6e2d..d291ec5 100644
> > --- a/lib/cmap.c
> > +++ b/lib/cmap.c
> > @@ -245,11 +245,11 @@ cmap_is_empty(const struct cmap *cmap)
> > }
> >
> > static uint32_t
> > -read_counter(struct cmap_bucket *bucket)
> > +read_counter(struct cmap_bucket *bucket, memory_order order)
> > {
> > uint32_t counter;
> >
> > - atomic_read(&bucket->counter, &counter);
> > + atomic_read_explicit(&bucket->counter, &counter, order);
> > return counter;
> > }
> >
> > @@ -259,7 +259,7 @@ read_even_counter(struct cmap_bucket *bucket)
> > uint32_t counter;
> >
> > do {
> > - counter = read_counter(bucket);
> > + counter = read_counter(bucket, memory_order_acquire);
> > } while (OVS_UNLIKELY(counter & 1));
> >
> > return counter;
> > @@ -291,7 +291,7 @@ retry:
> > for (i = 0; i < CMAP_K; i++) {
> > struct cmap_node *node = b1->nodes[i];
> > if (node && b1->hashes[i] == hash) {
> > - if (OVS_UNLIKELY(read_counter(b1) != c1)) {
> > + if (OVS_UNLIKELY(read_counter(b1, memory_order_release) !=
> > c1)) {
>
> I would find the code more balanced if a read_even_counter() was
> paired with a ?read_same_counter()?, rather than a raw
> read_counter. To be more specific, the explicit barrier should be
> visible or hidden by an utility function the same way in both cases.
Fair enough. I folded this in:
diff --git a/lib/cmap.c b/lib/cmap.c
index dcb82fc..bd1d516 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -265,6 +265,12 @@ read_even_counter(struct cmap_bucket *bucket)
return counter;
}
+static bool
+counter_changed(const struct cmap_bucket *b, uint32_t c)
+{
+ return OVS_UNLIKELY(read_counter(b, memory_order_release) != c);
+}
+
/* Searches 'cmap' for an element with the specified 'hash'. If one or more is
* found, returns a pointer to the first one, otherwise a null pointer. All of
* the nodes on the returned list are guaranteed to have exactly the given
@@ -291,7 +297,7 @@ retry:
for (i = 0; i < CMAP_K; i++) {
struct cmap_node *node = b1->nodes[i];
if (node && b1->hashes[i] == hash) {
- if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1)) {
+ if (counter_changed(b1, c1)) {
goto retry;
}
return node;
@@ -303,15 +309,14 @@ retry:
for (i = 0; i < CMAP_K; i++) {
struct cmap_node *node = b2->nodes[i];
if (node && b2->hashes[i] == hash) {
- if (OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
+ if (counter_changed(b2, c2)) {
goto retry;
}
return node;
}
}
- if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1) ||
- OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
+ if (counter_changed(b1, c1) || counter_changed(b2, c2)) {
goto retry;
}
return NULL;
> > goto retry;
> > }
> > return node;
> > @@ -303,15 +303,15 @@ retry:
> > for (i = 0; i < CMAP_K; i++) {
> > struct cmap_node *node = b2->nodes[i];
> > if (node && b2->hashes[i] == hash) {
> > - if (OVS_UNLIKELY(read_counter(b2) != c2)) {
> > + if (OVS_UNLIKELY(read_counter(b2, memory_order_release) !=
> > c2)) {
> > goto retry;
> > }
> > return node;
> > }
> > }
> >
> > - if (OVS_UNLIKELY(read_counter(b1) != c1) ||
> > - OVS_UNLIKELY(read_counter(b2) != c2)) {
> > + if (OVS_UNLIKELY(read_counter(b1, memory_order_release) != c1) ||
> > + OVS_UNLIKELY(read_counter(b2, memory_order_release) != c2)) {
> > goto retry;
> > }
> > return NULL;
> >
> > I realize that all these incrementals are a big mess. I've pushed the
> > overall series to my "ovs-reviews" repo at
> > https://github.com/blp/ovs-reviews in the "cmap" branch. I'll happily
> > repost a v3 if that is easiest for you.
>
> I?d rather you tell me that you have addresses the questions I made
> and push it :-) I think it would be best at this point get this in
> and gain some experience with it. We can then enhance it as we see
> opportunities for it.
Thanks.
I pushed this patch. We can fine-tune it (e.g. making nodes[] RCU
pointers) later.
I didn't push patch 2 yet since you pointed out a bug.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev