More comments. We maintain stats in lots of places: struct rule, struct facet, struct subfacet, struct xout_cache. It's not obvious how all these differ and how we shuffle them all around.
Every caller of lookup_xc() creates a new xc entry on failure. Can any of this be factored into a helper? If not, then all the callers of create_xc() use very similar 3 lines of code, so can that be integrated into create_xc()? A struct xout_cache is pretty big and the initial xzalloc() is followed pretty quickly by copying into its '->xout->wc' and then (in insert_xc()) into '->flow'. Can we save time by initializing only what we need? (Most notably we never need to zero the 256 bytes in xc->xout.odp_actions_stub.) handle_flow_miss_without_facet() previously translated the actions for each packet. Now, it translates the actions once (or not at all, if there's an xc already). Won't this prevent slow protocols (e.g. CFM, LACP, BFD) from working properly, because packets after the first won't go through translation? I guess the unit tests pass; maybe I am missing something (or maybe the unit tests pass because they never trigger the governor). Both forks of the main "if" in handle_flow_miss_without_facet() contain the same xlate_out_copy() call. I think it can be factored out. Actually I think it can be moved into the body of "if (op->xout.odp_actions.size) {", and then we don't need the xlate_out_uninit() in the "else" case. In handle_flow_miss_without_facet(), is it necessary to do the lookup_xc() inside the LIST_FOR_EACH loop? That is, can we move it above the loop and then just use 'xc' for the whole body of the loop? All the callers of rule_dpif_lookup() that provide a nonnull 'wc' call flow_wildcards_init_catchall() on that 'wc' immediately beforehand. Should rule_dpif_lookup() do it internally? I see opportunity for optimization in facet_create(). I don't know whether it would have any effect on real performance. Anyway: the 'wc' is only used if a new 'xc' must be created, and we can figure that out based on just miss->flow. So we could theoretically save time by passing NULL to rule_dpif_lookup() if there's an existing 'xc'. The commit changes facet_create()'s comment to mention a 'wc' parameter, but facet_create() doesn't have a 'wc' parameter. facet_lookup_valid() now appears to always iterate over every xout_cache entry (!). Partly I think that's just a mistake (it should not call invalidate_xc_tags() if the revalidate_set is empty, or invalidate_xc_tags() should do nothing if its revalidate_set parameter is empty). But, why does it need to do that at all? That is, why can't it just see whether 'facet->xc->xout.tags' intersects the revalidate_set, at least as an initial check? New code in facet_revalidate() looks like this: xc = lookup_xc(ofproto, &facet->flow); if (!xc) { ovs_assert(!facet->xc); xlate_in_init(&xin, ofproto, &facet->flow, &facet->initial_vals, new_rule, 0, NULL); xc = create_xc(&wc, &facet->initial_vals, facet); xlate_actions(&xin, &xc->xout); insert_xc(ofproto, xc, &facet->flow); refresh_subfacets = true; } else if (!facet->xc) { add_facet_to_xc(xc, facet); refresh_subfacets = true; } I am not sure about the "fall through" case at the end. I have a suspicion that, in this case, we always have xc == facet->xc, because I think that if the xc might have changed, then the caller would have destroyed that cache entry. If that is correct, then we can delete a lot of code from facet_revalidate() that only fires if xc != facet->xc. But I am not entirely convinced that is correct. I am not sure that revalidation does the right thing if only the slow path of a translation changes. This new code will do a lot of unneeded kernel datapath flow reinstallations. For example, if need_revalidate becomes true, then it will reinstall every flow in the kernel datapath, even if their ODP actions do not change. This becomes extra-expensive because it does the reinstallations one-by-one instead of batching them up. (The current code also does reinstallations one by one, but it has not been a problem because it only reinstalls flows whose actions change, which is rare.) In xout_cache_push_stats(), in the call to memset(), can we just write 0 instead of '\0'? The latter looks like a string null terminator, which seems weird here. I don't understand this comment in facet_push_stats__(): /* Use local stats instead of 'facet->xc->stats' because this * function may be called multiple times before * xout_cache_push_stats(), which leads to high counts. */ The split between the OFPACT_LEARN case in do_xlate_actions() and the implementation in xlate_learn_action() seems more awkward than before. Can we move the assignment to has_learn into xlate_learn_action()? I think that OFPACT_SET_IPV4_SRC, OFPACT_SET_IPV4_DST, OFPACT_SET_IPV4_DSCP, OFPACT_SET_L4_SRC_PORT, OFPACT_SET_L4_DST_PORT, and OFPACT_DEC_TTL need to add dependencies to 'wc', because their behavior differs based on the protocol. For example, OFPACT_SET_IPV4_SRC is a no-op if the Ethertype is anything other than IPv4, so I would think it would need to mark the Ethertype as a dependency during translation. I think that OFPACT_OUTPUT_REG needs to depend on the field it reads. I think that OFPACT_POP_QUEUE needs to mark the skb_priority as a dependency. Did you consider dependencies for the MPLS actions? destroy_xc() contains a doubled semicolon ";;" at the end of a statement. Thanks, Ben. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev