On Wed, Dec 23, 2015 at 10:04 PM, Doug Ledford <dledf...@redhat.com> wrote:
> On 10/15/2015 12:58 PM, Hefty, Sean wrote:
>>>>> ib_create_ah_from_wc needs to resolve the DMAC in order to create the
>>>>> AH (this may result sending an ARP and waiting for response).
>>>>> CM uses this function (which is now sleepable).
>>>>
>>>> This is a significant change to the CM.  The CM calls are invoked
>>> assuming that they return relatively quickly.  They're invoked from
>>> callbacks and internally.  Having the calls now wait for an ARP response
>>> requires that this be re-architected, so the calling thread doesn't go out
>>> to lunch for several seconds.
>>>
>>> Agree - this is a significant change, but it was done a long time ago
>>> (at v4.3 if I recall). When we need to send a message we need to
>>
>> We're at 4.3-rc5?
>>
>>> figure out the destination MAC. Even the passive side needs to do that
>>> as some vendors don't report the source MAC of the packet in their wc.
>>> Even if they did, since IP based addressing is rout-able by its
>>> nature, it should follow the networking stack rules. Some crazy
>>> configurations could force sending responses to packets that came from
>>> router1 to router2 - so we have no choice than resolving the DMAC at
>>> every side.
>>
>> Ib_create_ah_from_wc is broken.   It is now an asynchronous operation, only 
>> the call itself was left as synchronous.  We can't block kernel threads for 
>> a minute, or however long ARP takes to resolve.  The call itself must change 
>> to be async, and all users of it updated to allocate some request, queue it, 
>> and handle all race conditions that result -- such as state changes or 
>> destruction of the work that caused the request to be initiated.
>>
>
> I don't know who had intended to address this, but it got left out of
> the 4.4 work.  We need to not let this drop through the cracks (for
> another release).  Can someone please put fixing this properly on their
> TODO list?
>

IMHO, the proposed patch makes things better. Not applying the current
patch means we have a "sleeping while atomic" error (in addition to
the fact that kernel threads could wait until the ARP process
finishes), which is pretty bad. I tend to agree that adding another CM
state is probably a better approach, but unless someone steps up and
add this for v4.5, I think that's the best thing we have.

> --
> Doug Ledford <dledf...@redhat.com>
>               GPG KeyID: 0E572FDD
>
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to