On Nov 27, 2007 3:13 PM, Steve Wise <[EMAIL PROTECTED]> wrote: > > Caitlin Bestler wrote: > > On Nov 27, 2007 6:54 AM, Kanevsky, Arkady <[EMAIL PROTECTED]> wrote: > >> ULP can post recvs before connection is established but not to send > >> queue prior to connection establishment. > >> > > > > > > ULP can post sends only after it is notified that the connection is > > established. > > > > The issue is when the iWARP layer can issue this notification. > > > > If the MPA layer implements fencing on its own, then the notification can > > be provided immediately after the MPA Request/Response exchange. > > > > If not, it must wait for the first MPA frame. The problem is that > > implementations that adhere to closely to the RDMAC verbs can obtain > > no information about the connection unless there is a CQE producing event. > > The idea for this "hack" is that the passive side (the side that sends > the MPA response) will hold off posting the ESTABLISHED event to the > rdma-cm ULP until after it receives this 0B Read Request from the client... >
The problem is that this solution is being applied at the wrong layer. MPA is not the source of the problem, but rather the RDMAC layer verbs. The solution needs to be a verb-layer solution, not an MPA layer solution. Steve's last comment states the problem well: we are trying to enable the Verbs layer on the Passive side to generate the Established event, and if at all possible to do so in a way that places no requirements on the application layer. I believe it is possible to do so without making any modifications to MPA. The MPA protocol requirement is a safeguard against receiving an MPA Frame before the MPA Response frame. MPA does not have or need an RTR message, because the MPA RFC allows *any* MPA frame from the active side to effectively acknowledge receipt of the MPA Response. That includes a zero-length RDMA Write. An iWARP implementation can (perhaps SHOULD) implement an "MPA Fenced" state on the passive side that is cleared on receipt of any MPA frame. With such a "MPA Fence" feature, the CM layer can generate the "Connection Established" event as soon as it sends the MPA Response and the Passive-side ULP will be able to post to the SQ, the messages just won't go the wire until something is received. Meanwhile the Active Side must ensure that *some* MPA frame is sent immediately after the MPA Response is received. If it has traffic ready to go it can simply send that. If it does not, it can use a zero-length write. A zero-length write is totally transparent to the ULP at both ends. But that will only work for *some* implementations. On others a zero length RDMA Read is needed to unjam things. That's almost transparent, but not totally so since it temporarily uses an RDMA Read credit. And while nobody has spoken up to say *they* have that problem, I would not be surprised if there are implementations where nothing less than a full ULP "nop" message will suffice. So keeping the fix at the verbs layer, and allowing the minimal extra effort to be controlled by the Passive layer itself, suggests that the Passive side simply encode its MPA-unjam-action-required in the OFA standardized portion of the Private Data. Encodings would include: - Any MPA Frame, including a zero-length RDMA Write will unjam the passive side SendQ. - An untagged message or a zero-length RDMA Read will work. - Only an untagged message will work. In the latter cases the middleware will have to play games with standin receive WQEs and only posting the actual receive WQEs to the QP after the MPA fence has been unjammed. That isn't pretty, but if your hardware is fixed then it's either that or make the application deal with the problem. I have a hunch that the MPI developers would not like that option at all. How this differs from what Arkady proposed is that it avoids making any changes to MPA, but instead only makes use of the OFA defined portion of the Private Data. Further it allows use of a zero-length RDMA Write when that is sufficient to break the MPA logjam. A zero-length RDMA Write, unlike a zero-length RDMA Read, is *totally* transparent to the ULP. _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
