On Wed, Dec 04, 2013 at 10:28:10AM -0800, Jarno Rajahalme wrote:
> On Dec 3, 2013, at 2:58 PM, Ben Pfaff <[email protected]> wrote:
> > Bugs (?)
> > ========
> > 
> > I think the intent is that the 'prefix' member of struct trie_node is
> > stored "left justified", so that the first bit of the prefix is
> > shifted into the MSB position of the uint32_t and the LSB position of
> > the uint32_t is only nonzero if 'nbits' is 32.  But...  I don't
> > understand this line of code in trie_equal_bits() and
> > trie_prefix_equal_bits():
> >    uint32_t diff = node->prefix ^ ntohl(value[ofs / 32]) << ofs % 32;
> > and similarly in trie_get_prefix():
> >    prefix = ntohl(*pr) << ofs % 32;
> > because both of these seem to shift by the wrong amount for this
> > interpretation.  If I've got 10 bits of prefix with ofs % 32 == 0, I
> > need to shift left by 22 bits, not by 0 bits.  So... explain?
> > 
> 
> ?ofs? is the bit offset to the network byte order prefix at ?value?
> from where the comparison should start. For example, if you have an
> IPv6 address of 128 bits, the most significant bit is at offset 0
> and the last bit is at offset 127. For 32-bit IPv4 addresses the MSB
> would be at offset 0 and the LSB at offset 31 (which may seem weird
> as LSB is typically considered to be the bit number 0). This way the
> code works the same for any prefix length. Also, the offset starting
> from MSB is in line with the bit numbering used for the IP addresses
> in the relevant RFCs.
> 
> So, if you have a ?/10? IPv4 prefix and you are at offset 0 into the
> prefix (i.e. comparison starting at the beginning of the prefix),
> the bits of interest would already be at the MSB position and the
> right amount to shift is 0 bits.

Ah.  I did not expect that (it is the opposite of the bit numbering
for our bitwise functions, e.g. bitwise_zero()).  The code now makes
sense to me.

Can we describe the bit numbering somewhere?  At least to say that the
MSB is called bit 0 and the LSB is called bit N-1 (or however it is
best phrased)?

I stopped reviewing when I didn't understand this, because I couldn't
make sense of the details of the rest of the code without this
context.  I'll start again when you send v5.

> > Style
> > =====
> > 
> > The comment in trie_lookup() is missing a word or two ("Check that can
> > skip based...").
> > 
> > I think that the check for !mf in trie_lookup() is unnecessary.  (Is
> > there any way to have a trie without a match field?)
> > 
> > The indentation in trie_init() here made this statement really
> > confusing at first glance:
> >        /* Initialize subtable's prefix length on this field. */
> >        subtable->trie_plen[trie_idx] = field
> >            ? minimask_get_prefix_len(&subtable->mask, field)
> >            : 0;
> > I see why you did it that way but maybe a larger rewrite would make
> > the whole loop body easier to understand, e.g.:
> >        uint8_t *plen = &subtable->trie_plen[trie_idx];
> >        *plen = field ? minimask_get_prefix_len(&subtable->mask, field) : 0;
> >        if (*plen) {
> >            struct cls_rule *head;
> > 
> >            HMAP_FOR_EACH (head, hmap_node, &subtable->rules) {
> >                struct cls_rule *rule;
> > 
> >                FOR_EACH_RULE_IN_LIST (rule, head) {
> >                    trie_insert(trie, rule, *plen);
> >                }
> >            }
> >        }
> > but who knows...
> > 
> 
> I like this better: 

OK.  I like that too.

> [...]

> > The 'tries' parameter to check_tries() is kind of funny.  This
> > parameter is a function of the cls_subtable, that is, it could be
> > stored in the cls_subtable at trie creation time and never touched
> > again.  But we recalculate it on every call to find_match_wc().  I
> > came up with various other ways to do it (e.g. move to struct
> > cls_subtable), but I'm not sure in the end that it's actually a
> > valuable optimization worth keeping.
> 
> Right, that was more of an experiment on my part. I simplified the
> code and removed the ?tries? range.

Thanks.

> > I'm also not sure that the be32ofs member of struct trie_ctx is
> > valuable and worth keeping.  I think that it can be trivially
> > eliminated:
> > 
> > diff --git a/lib/classifier.c b/lib/classifier.c
> > index 338872b..2c34cf2 100644
> > --- a/lib/classifier.c
> > +++ b/lib/classifier.c
> > @@ -471,7 +471,6 @@ classifier_remove(struct classifier *cls, struct 
> > cls_rule *rule)
> > struct trie_ctx {
> >     const struct cls_trie *trie;
> >     bool lookup_done;     /* Status of the lookup. */
> > -    uint8_t be32ofs;      /* U32 offset of the field in question. */
> >     uint8_t match_plen;   /* Longest prefix than could possibly match. */
> >     uint8_t maskbits;     /* Prefix length needed to avoid false matches. */
> > };
> > @@ -480,7 +479,6 @@ static void
> > trie_ctx_init(struct trie_ctx *ctx, const struct cls_trie *trie)
> > {
> >     ctx->trie = trie;
> > -    ctx->be32ofs = trie->field->flow_be32ofs;
> >     ctx->lookup_done = false;
> > }
> > 
> > @@ -989,10 +987,11 @@ check_tries(const struct flow *flow, struct trie_ctx 
> > trie_ctx[CLS_MAX_TRIES],
> >      * needed to avoid folding in additional bits to the wildcards mask. */
> >     for (j = tries->start; j < tries->end; j++) {
> >         struct trie_ctx *ctx = &trie_ctx[j];
> > +        int be32ofs = ctx->trie->field->flow_be32ofs;
> > 
> 
> I guess I wanted to avoid resolving this many indirections this deep
> in the classifier, especially when we typically do this multiple
> times for each subtable (in the typical use case we find that the
> trie field is not relevant for the first two ranges in the following
> if statement). Since we have the context for keeping track of
> on-demand lookup, we might as well use it for precomputing stuff we
> know would be repeated many times over otherwise.

OK.

> > Documentation
> > =============
> > 
> > I wrote some comments for classifier.h:
> > 
> 
> I have incorporated these with the minor corrections below, thanks!

Thanks.  I might write some more for v5 once I understand the code
better.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev

Reply via email to