Sowmini/Thiru,

I just got to the bottom of a packet loss issue I've been seeing with my
IPMP bits, and I'd like to better understand the problem by starting with
why it doesn't occur in Nevada.

The issue is with adding IRE_CACHE entries to off-link hosts when there
are multiple available IP interfaces (ills) in the IPMP group.  In this
case, the IRE_CACHE entry for the gateway IRE will have an ire_stq
associated with one of the ills in the group.  However, because of IPMP's
round-robin output ill selection, the IRE_CACHE entry for the off-link
host may have an ire_stq that points to another ill in the group.  If they
do differ, then things break in my bits when we try to ire_add_v4() the
off-link IRE_CACHE entry, since it calls ndp_lookup_v4() to lookup the
gateway's NCE but it does so using the ill associated with the ire_stq of
the off-link IRE_CACHE entry.

For instance, suppose we have a resolved IRE_CACHE entry for gateway G,
and its ire_stq refers to "ce0".  Now suppose someone tries to send a
packet to host H which needs to be reached through gateway G.  Suppose the
IRE_CACHE entry that ip_newroute() creates has an ire_stq that refers to
"ce1".  Eventually, we'll reach ire_add_v4(), which will attempt to lookup
the NCE for gateway G, here:

        if (ire->ire_type & IRE_CACHE) {
                ASSERT(ire->ire_stq != NULL);
-->             nce = ndp_lookup_v4(ire_to_ill(ire),
                    ((ire->ire_gateway_addr != INADDR_ANY) ?
                    &ire->ire_gateway_addr : &ire->ire_addr),
                    B_TRUE);
                if (nce != NULL)
                        mutex_enter(&nce->nce_lock);

However, since `ire' is the IRE_CACHE entry for H, ire_to_ill() will
return ce1's ill.  However, the resolved NCE for G is on ce0's ill, so
the ndp_lookup_v4() fails and the packet is dropped.

For now, I've worked around this in my code by looking up the IRE_CACHE
entry for G and passing that ire to ire_to_ill() above, but as Nevada
doesn't need to do this, I'd like to understand what I'm missing here.

Thanks for clues,
-- 
meem

Reply via email to