Hello Roland, > > Do the post send work request whenever receiving DREQ, which meant the > > remote TX QP has already teared down, there is no new post recv > > completions any more after DREQ. > > After a DREQ is received, then the local QP is transitioned to the error > state. However, we don't know when all the receives queued up have > completed (with flush error status). Also, we may want to clean up a QP > when we didn't receive a DREQ (remote side crashed, or we just have an > idle connection).
There are three senarios we need to destroy QPs. 1. During connection establishement 2. Received DREQ 3. Idle connection, or remote side crashed, shutdown Currently, IPoIB-CM uses the same approach to destroy QP for all these cases, which is putting the QP to error status, and wait for async event LAST WQE to destroy the QP from post send last WQ context. Putting QP to error status could return error, so async event might not be generated. What I want to propose here: 1. During connection establishment: like REJ, REP failure, we should destory the QP immediately without putting QP to error since the QP is not RTU. 2. Received DREQ, we should destroy it in post send WR last WR context, we don't need to reply on async event. 3. Idle conneciton or remote side crashed, shutdown, we should put the QP to error, then destroy later. And if number of connections run out, we can GC the less recent used connection for the new connection. Then we have a common approach for both nonSRQ and SRQ. We can remove async event handler. I have tested above patches. But I would like to hear your thoughts before sumitting them. Thanks Shirley _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
