Jarno Yes, I can verify that your patch fixes my specially constructed test case. Thanks very much for this, I look forward to my next update from upstream when I can throw away my temporary fix.
Tony On Wed, Sep 16, 2015 at 12:00 PM, Jarno Rajahalme <[email protected]> wrote: > Tony, > > Sorry that it took some time to get to this, and thanks for the detailed > report, with it it was straightforward to add a test case verifying what is > happening here. > > As confusing as it may be, the datapath flow dumps that show no vlan matches > actually exact match the non-presence of vlans. This is due to how the > netlink flow key interface has been designed, and relates to the fact that > vlans are expressed with a specific ethertype, the vlan field, and the rest > of the match headers in encapsulated netlink attributes. To enforce this, > the Linux kernel module sets the vlan mask to all ones by default, which > will then be overwritten only if there is an actual match on vlans. OVS > userspace compensates for this when translating a netlink flow mask to OVS > internal struct flow representation by also defaulting the vlan_tci field to > all ones (0xffff). Hence, the existing code in revalidate_ukey() function is > correct and does not need a fix. > > However, the userspace datapath, that is used for testing (I believe also > with oftest), did not do anything special with the vlan_tci field, and was > happy to wildcard it. Since the netlink format is still used when > revalidating and dumping the userspace datapath flows, the same compensation > described above was done also with userspace datapath flows, which indeed > caused the existing megaflow to be modified and then matched regardless of > the presence of vlan tags in incoming packets. I have sent out a patch that > changes the userspace datapath to also force exact matching the vlan_tci > field and adds a test case verifying that this works as intended. > > You can find the patch here: > http://openvswitch.org/pipermail/dev/2015-September/060083.html > > It would be superb it you could verify that this fixes your test case > problems. > > Regards, > > Jarno > > > On Aug 18, 2015, at 6:49 PM, Tony van der Peet <[email protected]> > wrote: > > Ethan > > Thanks for the response. I have made some further progress, and in > fact have discovered a slightly crude fix that allows my tests to > pass. In ofproto-dpif-upcall.c:revalidate_ukey I have done this: > > /* Since the kernel is free to ignore wildcarded bits in the mask, we > can't > * directly check that the masks are the same. Instead we check that the > * mask in the kernel is more specific i.e. less wildcarded, than what > * we've calculated here. This guarantees we don't catch any packets we > * shouldn't with the megaflow. */ > // start of "fix" > /* To avoid all sorts of bother about special meanings of VLAN tags (VLAN > * present, VLAN absent, etc) let's just require the VLAN masks to be > equal. > */ > if (dp_mask.vlan_tci != wc.masks.vlan_tci) { > goto exit; > } > // end of "fix" > dp64 = (uint64_t *) &dp_mask; > xout64 = (uint64_t *) &wc.masks; > for (i = 0; i < FLOW_U64S; i++) { > if ((dp64[i] | xout64[i]) != dp64[i]) { > goto exit; > } > } > > Obviously, "goto exit" cannot be wrong, it's just affecting > performance. Whether it's the best possible fix is another question! > > This is the test I wrote using the oftest framework: > > class FlowInternalModify(MatchTest): > """ > Match on ipv4 protocol field (UDP), then change to match on VLAN absence. > """ > def runTest(self): > # Start out with no flows > delete_all_flows(self.controller) > do_barrier(self.controller) > time.sleep(2) > > match = ofp.match([ > ofp.oxm.eth_type(0x0800), > ofp.oxm.ip_proto(17), > ]) > > matching = { > "udp": simple_udp_packet(), > } > > nonmatching = { > #"tcp": simple_tcp_packet(), > #"icmp": simple_icmp_packet(), > } > > self.verify_match(match, matching, nonmatching) > time.sleep(3) > > # Interlude, delete all flows > delete_all_flows(self.controller) > do_barrier(self.controller) > time.sleep(2) > > match = ofp.match([ > ofp.oxm.vlan_vid(ofp.OFPVID_NONE), > ]) > > matching = { > "no vlan tag": simple_udp_packet() > } > > nonmatching = { > "vid=2 pcp=3": simple_udp_packet(dl_vlan_enable=True, > vlan_vid=2, vlan_pcp=3), > "vid=0 pcp=7": simple_udp_packet(dl_vlan_enable=True, > vlan_vid=0, vlan_pcp=7), > "vid=2 pcp=0": simple_udp_packet(dl_vlan_enable=True, > vlan_vid=2, vlan_pcp=0), > } > > self.verify_match(match, matching, nonmatching) > > Here's a run of this test without the "fix", with rules and flows > being shown alternately while the test runs (some editing has been > carried out). > > awplus#sh open flow > > > awplus#sh open rule > table_id=254, duration=57s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=57s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=57s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=57s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > > Test just started, deleted all rules and flows are aged out > > > awplus#sh open rule > duration=0s, n_packets=0, n_bytes=0, priority=1000,udp,actions=output:2 > duration=0s, n_packets=0, n_bytes=0, priority=1,actions=CONTROLLER:65535 > table_id=254, duration=61s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=61s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=61s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=61s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > awplus#sh open flow > recirc_id(0),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=no), > packets:0, bytes:0, used:never, actions:2 > > First part of test, rule and flow are OK and in sync > > > awplus#sh open rule > table_id=254, duration=65s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=65s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=65s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=65s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > awplus#sh open flow > recirc_id(0),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=no), > packets:1, bytes:104, used:2.768s, actions:drop > > Rule has been deleted but flow hangs around, but with its action changed to > drop, as expected. > > > awplus#sh open rule > duration=1s, n_packets=0, n_bytes=0, > priority=1000,vlan_tci=0x0000/0x1fff,actions=output:2 > duration=1s, n_packets=0, n_bytes=0, priority=1,actions=CONTROLLER:65535 > table_id=254, duration=70s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=70s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=70s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=70s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > awplus#sh open flow > recirc_id(0),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=no), > packets:4, bytes:416, used:0.219s, actions:2 > > New rule has been put in place, but whoops! old flow is still there and has > inherited the new rule's > action. This is not so good because the flow will take UDP packets > regardless of the presence of > a VLAN tag. > > > awplus#sh open rule > duration=11s, n_packets=3, n_bytes=312, > priority=1000,vlan_tci=0x0000/0x1fff,actions=output:2 > duration=11s, n_packets=0, n_bytes=0, priority=1,actions=CONTROLLER:65535 > table_id=254, duration=80s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=80s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=80s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=80s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > awplus#sh open flow > > Test over and flow has aged out. Test does not get rid of rule. > > > > WIth the above mentioned fix in place, here's what happens: > > awplus#sh open flow > > > awplus#sh open rule > table_id=254, duration=21s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=21s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=21s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=21s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > > Before test runs, switch has just rebooted, so nothing there. > > > awplus#sh open rule > duration=0s, n_packets=0, n_bytes=0, priority=1000,udp,actions=output:2 > duration=0s, n_packets=0, n_bytes=0, priority=1,actions=CONTROLLER:65535 > table_id=254, duration=31s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=31s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=31s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=31s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > awplus#sh open flow > recirc_id(0),in_port(1),eth_type(0x0800),ipv4(proto=17,frag=no), > packets:0, bytes:0, used:never, actions:2 > > First part of test, same as before, flow and rule are in sync > > > awplus#sh open rule > table_id=254, duration=35s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=35s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=35s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=35s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > awplus#sh open flow > > > Rule deleted in the interlude, and my new "fix" causes the flow to be > deleted. > > > awplus#sh open rule > duration=0s, n_packets=0, n_bytes=0, > priority=1000,vlan_tci=0x0000/0x1fff,actions=output:2 > duration=0s, n_packets=0, n_bytes=0, priority=1,actions=CONTROLLER:65535 > table_id=254, duration=38s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=38s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=38s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=38s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > awplus#sh open flow > recirc_id(0),in_port(1),eth_type(0x0800),ipv4(frag=no), packets:0, > bytes:0, used:never, actions:2 > recirc_id(0),in_port(1),eth_type(0x8100),vlan(vid=0,pcp=7/0x0),encap(eth_type(0x0800),ipv4(frag=no)), > packets:1, bytes:100, used: > 0.368s, actions:userspace(pid=0,slow_path(controller)) > recirc_id(0),in_port(1),eth_type(0x8100),vlan(vid=2),encap(eth_type(0x0800),ipv4(frag=no)), > packets:3, bytes:300, used:0.344s, ac > tions:userspace(pid=0,slow_path(controller)) > > The second part of the test is running and things are looking more like they > do > when the second part is run as a test on its own. > > > awplus#sh open rule > duration=11s, n_packets=2, n_bytes=204, > priority=1000,vlan_tci=0x0000/0x1fff,actions=output:2 > duration=11s, n_packets=6, n_bytes=600, priority=1,actions=CONTROLLER:65535 > table_id=254, duration=50s, n_packets=0, n_bytes=0, > priority=2,recirc_id=0,actions=drop > table_id=254, duration=50s, n_packets=0, n_bytes=0, > priority=0,reg0=0x1,actions=controller(reason=no_match) > table_id=254, duration=50s, n_packets=0, n_bytes=0, > priority=0,reg0=0x2,actions=drop > table_id=254, duration=50s, n_packets=0, n_bytes=0, > priority=0,reg0=0x3,actions=drop > awplus#sh open flow > > And eventually the flow times out. > > > So that's it! > > To answer your question, you can see that the flow hangs around, with > its action changed, but it is no longer really representative of the > new rule since it covers all UDP packets regardless of whether there > is a VLAN tag in the packet or not. The non-matching cases are meant > to be sent to the controller, but the flow sends them to port 2 > instead. > > By the way, your apology was totally unnecessary! You and the team do > such a great job on this code and quality is so high that I moved to > tip of master a long time ago and have never regretted it. The number > of issues I find in the code is tiny. Thanks for all your efforts. > > Tony > > > On Wed, Aug 19, 2015 at 12:07 PM, Ethan Jackson <[email protected]> wrote: > > Hi Tony, > > Sorry you're running into this, it does sound like a problem this > patch could cause. I suppose what I'm having trouble working out here > is whether this is an actual bug in the patch, or if you're test > framework is making assumptions about how the code works which are no > longer valid. Any additional information you have on your setup would > be especially helpful, especially if you can whittle it down into a > somewhat minimal series of steps which reproduces the problem. > > Some comments inline: > > In a nutshell, flows created for earlier tests are being kept alive > and having their actions modified, but then the modification as a > result of a later test causes that later test to fail. > > > Correct, as a result of this patch, we try to modify flows in the > datapath in place, rather than deleting them when their underlying > openflow rules changes. > > > when the rule becomes: > > duration=2s, n_packets=0, n_bytes=0, > priority=1000,vlan_tci=0x0000/0x1fff,actions=output:2 > > > So what you're saying is, you add a rule that matches on VLAN tag, but > the datapath rule corresponding to it never updates? So all your > traffic forwards based on the old rule not the new one? > > Ethan > > > Disregarding the flow's age, any UDP packet sent, VLAN or not, will > result in output:2, but this is not the correct action for all packets > (those with a VLAN tag). > > My guess is that this is fairly deep and beyond my limited experience > with OpenVSwitch. However, I will continue to work this issue, but > thought you might like to know. And any advice much appreciated. > > Tony > _______________________________________________ > discuss mailing list > [email protected] > http://openvswitch.org/mailman/listinfo/discuss > > > _______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss
