On Thu, Dec 05, 2013 at 04:27:26PM -0800, Jarno Rajahalme wrote:
> Add a prefix tree (trie) structure for tracking the used address
> space, enabling skipping classifier tables containing longer masks
> than necessary for an address field value in a packet header being
> classified.  This enables less unwildcarding for datapath flows in
> parts of the address space without host routes.
> 
> Trie lookup is interwoven to the staged lookup, so that a trie is
> searched only when the configured trie field becomes relevant
> for the lookup.  The trie lookup results are retained so that each
> trie is checked at most once for each classifier lookup.
> 
> This implementation tracks the number of rules at each address prefix
> for the whole classifier.  More aggressive table skipping would be
> possible by maintaining lists of tables that have prefixes at the
> lengths encountered on tree traversal, or by maintaining separate
> tries for subsets of rules separated by metadata fields.
> 
> Prefix tracking is configured via OVSDB.  A new column "prefixes" is
> added to the database table "Flow_Table".  "prefixes" is a set of
> string values listing the field names for which prefix lookup should
> be used.
> 
> As of now, the fields for which prefix lookup can be enabled are:
> - tun_id, tun_src, tun_dst
> - nw_src, nw_dst (or aliases ip_src and ip_dst)
> - ipv6_src, ipv6_dst
> 
> There is a maximum number of fields that can be enabled for any one
> flow table.  Currently this limit is 3.
> 
> Examples:
> 
> ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \
>  --id=@N1 create Flow_Table name=table0
> ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \
>  --id=@N1 create Flow_Table name=table1
> 
> ovs-vsctl set Flow_Table table0 prefixes=ip_dst,ip_src
> ovs-vsctl set Flow_Table table1 prefixes=[]
> 
> Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com>

If trie_lookup() returns UINT8_MAX, then it does not write anything to
its *checkbits argument.  After looking a little further, I guess that
this does not matter because it will never be read in that case.  Still,
I really like functions whose interfaces always write to output
arguments, because it makes later behavior of the program more
predictable in corner cases.

I am not certain that uint8_t is the best possible type for the width of
a field.  It is more than enough for the longest field that Open vSwitch
or OpenFlow has now, and it may be enough for the longest field that
either one will ever support.  However: NXM and OXM both in theory
support fields up to 1,016 bits wide.  I'll leave it to your judgment
whether it's worth worrying about that.

Did you consider making be_get_bit_at() and get_bit_at() return bool?

Every use of be_get_bit_at() is in an expression of the form
node->edges[be_get_bit_at(prefix, ofs)].  I don't know whether it would
be better to write a function to do that job.

The { should be on a separate line:
    trie_is_leaf(const struct trie_node *trie) {
and there should not be a ; after the } on the final line of the same
function.

In mask_set_prefix_bits(), I would write:
        mask[i] = OVS_BE32_MAX;
in place of:
        mask[i] = CONSTANT_HTONL(~0u);

I don't normally mark by-value parameters as const, as done on "const
uint8_t nbits" on mask_set_prefix_bits(), because I regard const on
parameters as mainly a promise to the caller not to modify an object
visible to the caller, but this kind of "const" does not make such a
promise.  Similarly for 'ofs' and 'plen' in trie_prefix_equal_bits() and
'prefix' in get_bit_at().

I think that the loop condition "while (node)" in trie_insert() can
never be false.  Can we equivalently write "for (;;)"?

There is a typo in a comment in trie_insert(): "it's" should be "its".

Can we eliminate the special case for the root in trie_insert()?  The
code appears to be written with this in mind but without following
through.  Something like this:

static void
trie_insert(struct cls_trie *trie, const struct cls_rule *rule, int mlen)
{
    const ovs_be32 *prefix = minimatch_get_prefix(&rule->match, trie->field);
    struct trie_node **edge;
    struct trie_node *node;
    int ofs;

    ofs = 0;
    for (edge = &trie->root;
         (node = *edge) != NULL;
         edge = &node->edges[be_get_bit_at(prefix, ofs)]) {
        unsigned int eqbits = trie_prefix_equal_bits(node, prefix, ofs, mlen);
        ofs += eqbits;
        if (eqbits < node->nbits) {
            /* Mismatch, new node needs to be inserted above. */
            int old_branch = get_bit_at(node->prefix, eqbits);

            /* New parent node. */
            *edge = trie_branch_create(prefix, ofs - eqbits, eqbits,
                                       ofs == mlen ? 1 : 0);

            /* Adjust old node for its new position in the tree. */
            node->prefix <<= eqbits;
            node->nbits -= eqbits;
            (*edge)->edges[old_branch] = node;

            /* Check if need a new branch for the new rule. */
            if (ofs < mlen) {
                (*edge)->edges[!old_branch]
                    = trie_branch_create(prefix, ofs, mlen - ofs, 1);
            }
            return;
        }
        /* Full match so far. */

        if (ofs == mlen) {
            /* Full match at the current node, rule needs to be added here. */
            node->n_rules++;
            return;
        }
    }
    *edge = trie_branch_create(prefix, ofs, mlen - ofs, 1);
}

I probably owe you an apology for fussing with your code too much.  This
is one way I study tricky code: I tweak it until I'm satisfied I
understand it.

In trie_remove(), should we log an error here?
            /* Mismatch, nothing to be removed.  This should never happen, as
             * only rules in the classifier are ever removed. */
and here?
    /* Cannot go deeper. This should never happen, since only rules
     * that actually exist in the classifier are ever removed. */

It might be possible to simplify the backtracking logic in
trie_remove(), by keeping the stack full of the edges that we followed
rather than the nodes that we visited.  Then "*edge = new_node;" would
suffice instead of needing to distinguish the root from other cases and
re-figure which direction to follow.  But I am not sure that making this
change is worth it.

Acked-by: Ben Pfaff <b...@nicira.com>
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to