> I'm unclear why the added NCE would usually be in the unresolved state; > ire_nce_init() does:
As thirumalai pointed out, this is because ire_fp_mp passed in is null And the reason that it is null is because the fp_mp itself can be freed (see ndp_fastpath_flush()) > The problem that I see with the above is that if the NCE already exists > but is not ND_REACHABLE, then we will not replace it with an ND_REACHABLE right, so the safest (temporary fix, till we clean up the ip_newroute() path) is to make sure that we do the following in ip_newroute for this case (adding ire_cache for offlink host, based on gateway's ire_cache): nce->nce_state = ND_REACHABLE; nce_fastpath(nce); with the returned nce. > Yes, I overlooked this. I've done some testing and this is true in my > bits -- and seems to be true in onnv as well. Is there a reason why > ire_fp_mp has to be NULL? I recall running into race conditions where the fastpath would delete the nce in between the calls from ip_newroute and the ire* functions. > I have not tested against onnv. Ok. So I guess there are no tests in the ipmp test suite to trigger this particular case. > Is it possible that: > 6508701 ire_add_v4() often adds unresolved IREs even when told not to > ... is playing a role here? Specifically, before that fix, ire_add_v4() > will add unresolved IREs regardless of the allow_unresolved flag. So nope. the root-cause is different. even if you add the ire, unless someone kicked off arp, the packets would never get sent. > wouldn't that mask this bug? If so, seems like IPMP should be pretty > broken in Nevada right now. I believe this case might not easily encountered, even with ipmp, which is why we have not seen it... does the ipmp test suite actually trigger a case where we send packets to an offlink dst through various interfaces in an ipmp group, and there is only 1 gateway on the lan? --Sowmini
