On Dec 7, 2013, at 2:04 PM, Ben Pfaff <b...@nicira.com> wrote: > On Thu, Dec 05, 2013 at 04:27:26PM -0800, Jarno Rajahalme wrote: >> Add a prefix tree (trie) structure for tracking the used address >> space, enabling skipping classifier tables containing longer masks >> than necessary for an address field value in a packet header being >> classified. This enables less unwildcarding for datapath flows in >> parts of the address space without host routes. >> >> Trie lookup is interwoven to the staged lookup, so that a trie is >> searched only when the configured trie field becomes relevant >> for the lookup. The trie lookup results are retained so that each >> trie is checked at most once for each classifier lookup. >> >> This implementation tracks the number of rules at each address prefix >> for the whole classifier. More aggressive table skipping would be >> possible by maintaining lists of tables that have prefixes at the >> lengths encountered on tree traversal, or by maintaining separate >> tries for subsets of rules separated by metadata fields. >> >> Prefix tracking is configured via OVSDB. A new column "prefixes" is >> added to the database table "Flow_Table". "prefixes" is a set of >> string values listing the field names for which prefix lookup should >> be used. >> >> As of now, the fields for which prefix lookup can be enabled are: >> - tun_id, tun_src, tun_dst >> - nw_src, nw_dst (or aliases ip_src and ip_dst) >> - ipv6_src, ipv6_dst >> >> There is a maximum number of fields that can be enabled for any one >> flow table. Currently this limit is 3. >> >> Examples: >> >> ovs-vsctl set Bridge br0 flow_tables:0=@N1 -- \ >> --id=@N1 create Flow_Table name=table0 >> ovs-vsctl set Bridge br0 flow_tables:1=@N1 -- \ >> --id=@N1 create Flow_Table name=table1 >> >> ovs-vsctl set Flow_Table table0 prefixes=ip_dst,ip_src >> ovs-vsctl set Flow_Table table1 prefixes=[] >> >> Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com> > > If trie_lookup() returns UINT8_MAX, then it does not write anything to > its *checkbits argument. After looking a little further, I guess that > this does not matter because it will never be read in that case. Still, > I really like functions whose interfaces always write to output > arguments, because it makes later behavior of the program more > predictable in corner cases. >
OK > I am not certain that uint8_t is the best possible type for the width of > a field. It is more than enough for the longest field that Open vSwitch > or OpenFlow has now, and it may be enough for the longest field that > either one will ever support. However: NXM and OXM both in theory > support fields up to 1,016 bits wide. I'll leave it to your judgment > whether it's worth worrying about that. > I changed the relevant uint8_t types to unsigned int types, even though having a prefix longer than 255 bits is a rather remote possibility. > Did you consider making be_get_bit_at() and get_bit_at() return bool? > If they return bool, the caller must use !! to get 1 from the non-zero return value to use it as an index. I’d rather have these return the bit itself. > Every use of be_get_bit_at() is in an expression of the form > node->edges[be_get_bit_at(prefix, ofs)]. I don't know whether it would > be better to write a function to do that job. > I added functions trie_next_node() and trie_next_edge() for this purpose. > The { should be on a separate line: > trie_is_leaf(const struct trie_node *trie) { > and there should not be a ; after the } on the final line of the same > function. Oops :-) > > In mask_set_prefix_bits(), I would write: > mask[i] = OVS_BE32_MAX; > in place of: > mask[i] = CONSTANT_HTONL(~0u); > Done. > I don't normally mark by-value parameters as const, as done on "const > uint8_t nbits" on mask_set_prefix_bits(), because I regard const on > parameters as mainly a promise to the caller not to modify an object > visible to the caller, but this kind of "const" does not make such a > promise. Similarly for 'ofs' and 'plen' in trie_prefix_equal_bits() and > 'prefix' in get_bit_at(). > Cleaned up. > I think that the loop condition "while (node)" in trie_insert() can > never be false. Can we equivalently write "for (;;)”? > Right. > There is a typo in a comment in trie_insert(): "it's" should be "its”. > Fixed, thanks! > Can we eliminate the special case for the root in trie_insert()? The > code appears to be written with this in mind but without following > through. Something like this: > > static void > trie_insert(struct cls_trie *trie, const struct cls_rule *rule, int mlen) > { > const ovs_be32 *prefix = minimatch_get_prefix(&rule->match, trie->field); > struct trie_node **edge; > struct trie_node *node; > int ofs; > > ofs = 0; > for (edge = &trie->root; > (node = *edge) != NULL; > edge = &node->edges[be_get_bit_at(prefix, ofs)]) { > unsigned int eqbits = trie_prefix_equal_bits(node, prefix, ofs, mlen); > ofs += eqbits; > if (eqbits < node->nbits) { > /* Mismatch, new node needs to be inserted above. */ > int old_branch = get_bit_at(node->prefix, eqbits); > > /* New parent node. */ > *edge = trie_branch_create(prefix, ofs - eqbits, eqbits, > ofs == mlen ? 1 : 0); > > /* Adjust old node for its new position in the tree. */ > node->prefix <<= eqbits; > node->nbits -= eqbits; > (*edge)->edges[old_branch] = node; > > /* Check if need a new branch for the new rule. */ > if (ofs < mlen) { > (*edge)->edges[!old_branch] > = trie_branch_create(prefix, ofs, mlen - ofs, 1); > } > return; > } > /* Full match so far. */ > > if (ofs == mlen) { > /* Full match at the current node, rule needs to be added here. */ > node->n_rules++; > return; > } > } > *edge = trie_branch_create(prefix, ofs, mlen - ofs, 1); > } > This is a nice rearrangement, thank you! I did the corresponding change to trie_remove(). > I probably owe you an apology for fussing with your code too much. This > is one way I study tricky code: I tweak it until I'm satisfied I > understand it. > Not at all, I’m all for better code :-) > In trie_remove(), should we log an error here? > /* Mismatch, nothing to be removed. This should never happen, as > * only rules in the classifier are ever removed. */ > and here? > /* Cannot go deeper. This should never happen, since only rules > * that actually exist in the classifier are ever removed. */ I had ovs_asserts here previously, so it makes sense to log warnings here. > > It might be possible to simplify the backtracking logic in > trie_remove(), by keeping the stack full of the edges that we followed > rather than the nodes that we visited. Then "*edge = new_node;" would > suffice instead of needing to distinguish the root from other cases and > re-figure which direction to follow. But I am not sure that making this > change is worth it. > Done. > Acked-by: Ben Pfaff <b...@nicira.com> You’ll get to say what you think of the result before I push, though. I’d like you to review trie_lookup_value(), trie_insert() and trie_remove() once more. I’d like your OK on the trie manipulation logic (adding and removing nodes). Both of these are constrained by the maximum number of prefix bits a single node can hold (32). Some of the node addition detail is in the trie_branch_create(), which can add create a chain of nodes. Jarno
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev