On Wed, Dec 23, 2015 at 10:04 PM, Doug Ledford <dledf...@redhat.com> wrote: > On 10/15/2015 12:58 PM, Hefty, Sean wrote: >>>>> ib_create_ah_from_wc needs to resolve the DMAC in order to create the >>>>> AH (this may result sending an ARP and waiting for response). >>>>> CM uses this function (which is now sleepable). >>>> >>>> This is a significant change to the CM. The CM calls are invoked >>> assuming that they return relatively quickly. They're invoked from >>> callbacks and internally. Having the calls now wait for an ARP response >>> requires that this be re-architected, so the calling thread doesn't go out >>> to lunch for several seconds. >>> >>> Agree - this is a significant change, but it was done a long time ago >>> (at v4.3 if I recall). When we need to send a message we need to >> >> We're at 4.3-rc5? >> >>> figure out the destination MAC. Even the passive side needs to do that >>> as some vendors don't report the source MAC of the packet in their wc. >>> Even if they did, since IP based addressing is rout-able by its >>> nature, it should follow the networking stack rules. Some crazy >>> configurations could force sending responses to packets that came from >>> router1 to router2 - so we have no choice than resolving the DMAC at >>> every side. >> >> Ib_create_ah_from_wc is broken. It is now an asynchronous operation, only >> the call itself was left as synchronous. We can't block kernel threads for >> a minute, or however long ARP takes to resolve. The call itself must change >> to be async, and all users of it updated to allocate some request, queue it, >> and handle all race conditions that result -- such as state changes or >> destruction of the work that caused the request to be initiated. >> > > I don't know who had intended to address this, but it got left out of > the 4.4 work. We need to not let this drop through the cracks (for > another release). Can someone please put fixing this properly on their > TODO list? >
IMHO, the proposed patch makes things better. Not applying the current patch means we have a "sleeping while atomic" error (in addition to the fact that kernel threads could wait until the ARP process finishes), which is pretty bad. I tend to agree that adding another CM state is probably a better approach, but unless someone steps up and add this for v4.5, I think that's the best thing we have. > -- > Doug Ledford <dledf...@redhat.com> > GPG KeyID: 0E572FDD > > Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html