After discussing this issue with Jeff via private e-mails. I would like to open the issue to the group for futher discussion.
Issue (as described by Steve Wise): Currently OMPI uses qp 0 for all credit updates (by design). This breaks when running over the chelsio rnic due to a race condition between advertising the availability of a buffer using qp0 when the buffer was posted on one of the other qps. It is possible (and easily reproducible) that the peer gets the advertisement and sends data into the qp in question _before_ the rnic has processed the recv buffer and made it available for placement. This results in a connection termination. BTW, other hca's have this issue too. ehca, for example, claims they have the same race condition. I think the timing hole is much smaller though for devices that have 2 separate work queues for the SQ and RQ of a QP. Chelsio has a single work queue to implement both SQ and RQ, so processing of RQ work requests gets queued up behind pending SQ entries which can make this race condition more prevalent. I don't know of any way to avoid this issue other that to ensure that all credit updates for qp X are posted only on qp X. If we do this, then the chelsio HW/FW ensures that the RECV is posted before the subsequent send operation that advertises the buffer is processed. To address this Jeff Squyres recommends: 1. make an mca parameter that governs this behavior (i.e., whether to send all flow control messages on QP0 or on their respective QPs) 2. extend the ini file parsing code to accept this parameter as well (need to add a strcmp or two) 3. extend the ini file to fill in this value for all the nic's listed (to include yours). 4. extend the logic in the rest of the btl to send the flow control messages either across qp0 or the respective qp, depending on the value of the mca param / ini value. I am happy to do the work to enable this, but I would like to get everyone's feed back before I start down this path. Jeff said Gleb did the work to change openib to behave this way, so any insight would be helpful. Thanks, Jon