Re: [ovs-discuss] Can't send to controller when doing a resubmit and learn (regression)

Ben Pfaff Thu, 10 Jul 2014 11:38:32 -0700

On Mon, Apr 28, 2014 at 12:17:50PM -0700, Murphy McCauley wrote:
> 
> On Apr 23, 2014, at 9:03 PM, Ben Pfaff <[email protected]> wrote:
> 
> > On Tue, Apr 22, 2014 at 11:29:55PM -0700, Murphy McCauley wrote:
> >> On Apr 22, 2014, at 8:54 AM, Ben Pfaff <[email protected]> wrote:
> >> 
> >>> On Sat, Apr 19, 2014 at 07:50:31PM -0700, Murphy McCauley wrote:
> >>>> I recently found a technique I'd used with OVS 1.9 no longer worked 
> >>>> under OVS built from master a few days ago.  Here's a pretty minimal 
> >>>> example:
> >>>> 
> >>>> table=0, actions=resubmit(,2),resubmit(,1)
> >>>> table=1, reg1=0 
> >>>> actions=learn(table=2,hard_timeout=1,load:1->NXM_NX_REG1[]),controller
> >>>> 
> >>>> In this example, it's a poor man's controller rate limiter.  The 
> >>>> previous (and expected) behavior is that you can spam packets (e.g., 
> >>>> ping -i 0.1) and only one per second goes to the controller.  The 
> >>>> observed behavior on new versions of OVS is that nothing ever comes to 
> >>>> the controller.
> >>>> 
> >>>> Adding a reg1=1 match to table 1, it was clear the matching was working 
> >>>> right (the packet counts of the table 1 rules summed to the packet count 
> >>>> of the table 0 rule).  But still nothing at the controller.  A flood 
> >>>> action, however, works just fine -- one per second.  This got me 
> >>>> thinking it's a fast path/slow path issue.  I did some digging and found:
> >>>> 
> >>>> Before 4dff909 (Move odp_actions from subfacet to facet), things worked 
> >>>> as expected.  After this commit, it didn't work, but I found a 
> >>>> workaround based on a glance through the diff and a hunch: if I put a 
> >>>> controller action in the table 0 rule too, both controller actions 
> >>>> worked.  I was inspired to try this by the change around line 5027.  
> >>>> Without the table 0 controller action, facet_revalidate() gives up when 
> >>>> the facet goes from fast path to slow path.  With it, I am guessing it 
> >>>> starts out on the slow path and never changes.  Whether any of that is 
> >>>> significant or not, by sending to a nonexistent controller ID in table 
> >>>> 0, I had the behavior I wanted again.
> >>>> 
> >>>> Unfortunately, this workaround didn't work on master.  So more digging.  
> >>>> It turns out that after 3d9c5e5 (Handle learn action flow mods 
> >>>> asynchronously), the workaround wasn't required anymore and things were 
> >>>> back to working as expected.
> >>>> 
> >>>> Obviously this didn't last forever.  Specifically, when 9129672 (Move 
> >>>> "learn" actions into individual threads) more or less undid the 
> >>>> previous, even the workaround doesn't work.
> >>>> 
> >>>> I tried to find anything related on the mailing list and didn't come up 
> >>>> with anything.  Is it unknown?  Is there any reason why this *shouldn't* 
> >>>> work?  Any thoughts on getting it to work again?
> >>> 
> >>> At a glance, this should work (although it's not a use case I've
> >>> considered before).  It's not obvious to me why it doesn't.  If you
> >>> figure out a fix (though I'd like to take a look myself, I don't have
> >>> the time), please submit it, and then we'll add a test to avoid future
> >>> regression.
> >> 
> >> Hi, Ben, thanks for confirming that it should work.
> >> 
> >> I believe the reason it doesn't lies in handle_upcalls(), which calls 
> >> xlate_actions() without the packet and later calls 
> >> xlate_actions_for_side_effects() with the packet which should actually 
> >> send to the controller.  Unfortunately, by the time the actions are xlated 
> >> the second time, the world has changed due to the flow_mod resulting from 
> >> the learn action the first time they were xlated.  The result is that 
> >> things go differently when trying to run the side effects -- we now hit 
> >> the newly learned rule.  Before 9129672, the flow_mods were queued and I 
> >> guess hadn't actually executed when xlate_actions_for_side_effects() ran.
> >> I think the additional xlate stems from bcd2633 (Store relevant fields for 
> >> wildcarding in facet).
> >> 
> >> I don't know the code well enough to know if there's a particularly 
> >> elegant way of solving this.  Someone else must have a better idea. :)
> >> 
> >> A sidenote is that the actions are xlated both times with may_learn, which 
> >> seemed odd to me.  Just for fun I turned it off the second time (which, of 
> >> course, is the not-useful-to-me case), and it didn't change the results of 
> >> make check for whatever that's worth.
> > 
> > Thanks for the insight.  That ought to help.
> 
> So one way it had occurred to me to resolve this was to cache results from 
> the first xlate_actions() so that the one for side effects would be a replay 
> and not an entirely new process.  Although used for a different aim, it seems 
> like there's some conceptual crossover here with Joe Stringer's "Cache the 
> modules affected by xlate_actions()" series of patches.  Just thought I'd 
> mention this (CCing in Joe Stringer) in case anyone had any deeper thoughts.
> 
> -- Murphy


Murphy, did this ever get fixed to your satisfaction?
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

Re: [ovs-discuss] Can't send to controller when doing a resubmit and learn (regression)

Reply via email to