> > Assuming that's the case, the bug > > seems to be that for IPv6, if we go through ill_restart_dad() -> > > ndp_do_recovery() -> ip_ndp_recover(), we forget to stop the recovery > > timer. As a result, if the ipif again becomes a duplicate, then > > ip_ndp_excl() will clobber ipif_recovery_id as previously described. This > > doesn't happen for IPv4 since ill_restart_dad() goes through ARP and then > > comes back via ip_arp_excl() which calls ipif_resolver_up() which cancels > > the recovery timer. > > > > Please let me know what you think. I'm testing out a fix to this right > > now using the stress test, and if you agree with the above analysis, I can > > integrate it with the Clearview IPMP wad. > > Yes, I agree with that. At the point where IPIF_DUPLICATE is cleared > in ip_ndp_recover, we should no longer have a running recovery timer. > > ndp_do_recovery could check this and cancel the timer before doing > qwriter_ip on ip_ndp_recover.
So you'd rather I do the check/clear in ndp_do_recovery() and not ip_ndp_recover()? OK, I can make that change. BTW, I also added ASSERTs to check that ipif_recovery_id is 0 before we set it in ip_ndp_excl() and ip_arp_excl() to make future problems easier to catch. -- meem
