[resending this to nox-dev for a wider audience]

Hello,

We've been trying to investigate some weird behavior in our network
and think we've nailed down the cause.  The symptoms were that the
first packet of a flow were sometimes getting dropped, but only for
certain machines.

Upon investigation, it looks like given a packet_in with a packet
buffered, SNAC responds with two messages: a flow mod to insert a new
rule (with bufferid set to NONE) and then a packet_out to release the
buffered packet with output action = TABLE.  We hypothesize that the
flow table insert does not always complete by the time the packet_out
arrives, and thus the buffered packet is dropped (or misrouted).

The reason why this has probably gone unnoticed so far is that (single
threaded) software switches are not going to have this problem, and
because it's a race condition that causes a packet drop, it usually
just affects performance and
not correctness.  I've only noticed it because I'm connected directly
to the NEC  and this was particularly problematic for DNS requests.
With 3+ DNS servers and some timeout between requests,  DNS was
consistently completely failing as a result of this (the newly
inserted flow entry would timeout before the requests round-robin'ed
back to each DNS server).

IMHO, the best fix is to have the flow_mod also release the buffered
packet, but we're not sure about how NOX/SNAC's internals work, so
that might not be the easiest change.

Let us know if this makes sense to you and if you know of an easy solution,

- Rob
.

_______________________________________________
nox-dev mailing list
[email protected]
http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org

Reply via email to