> >  Assuming that's the case, the bug
 > > seems to be that for IPv6, if we go through ill_restart_dad() ->
 > > ndp_do_recovery() -> ip_ndp_recover(), we forget to stop the recovery
 > > timer.  As a result, if the ipif again becomes a duplicate, then
 > > ip_ndp_excl() will clobber ipif_recovery_id as previously described.  This
 > > doesn't happen for IPv4 since ill_restart_dad() goes through ARP and then
 > > comes back via ip_arp_excl() which calls ipif_resolver_up() which cancels
 > > the recovery timer.
 > > 
 > > Please let me know what you think.  I'm testing out a fix to this right
 > > now using the stress test, and if you agree with the above analysis, I can
 > > integrate it with the Clearview IPMP wad.
 > 
 > Yes, I agree with that.  At the point where IPIF_DUPLICATE is cleared
 > in ip_ndp_recover, we should no longer have a running recovery timer.
 > 
 > ndp_do_recovery could check this and cancel the timer before doing
 > qwriter_ip on ip_ndp_recover.

So you'd rather I do the check/clear in ndp_do_recovery() and not
ip_ndp_recover()?  OK, I can make that change.  BTW, I also added ASSERTs
to check that ipif_recovery_id is 0 before we set it in ip_ndp_excl() and
ip_arp_excl() to make future problems easier to catch.


-- 
meem

Reply via email to