[resending this to nox-dev for a wider audience] Hello,
We've been trying to investigate some weird behavior in our network and think we've nailed down the cause. The symptoms were that the first packet of a flow were sometimes getting dropped, but only for certain machines. Upon investigation, it looks like given a packet_in with a packet buffered, SNAC responds with two messages: a flow mod to insert a new rule (with bufferid set to NONE) and then a packet_out to release the buffered packet with output action = TABLE. We hypothesize that the flow table insert does not always complete by the time the packet_out arrives, and thus the buffered packet is dropped (or misrouted). The reason why this has probably gone unnoticed so far is that (single threaded) software switches are not going to have this problem, and because it's a race condition that causes a packet drop, it usually just affects performance and not correctness. I've only noticed it because I'm connected directly to the NEC and this was particularly problematic for DNS requests. With 3+ DNS servers and some timeout between requests, DNS was consistently completely failing as a result of this (the newly inserted flow entry would timeout before the requests round-robin'ed back to each DNS server). IMHO, the best fix is to have the flow_mod also release the buffered packet, but we're not sure about how NOX/SNAC's internals work, so that might not be the easiest change. Let us know if this makes sense to you and if you know of an easy solution, - Rob . _______________________________________________ nox-dev mailing list [email protected] http://noxrepo.org/mailman/listinfo/nox-dev_noxrepo.org
