On Tue, Oct 12, 2010 at 8:50 PM, Ralph Campbell <[email protected]> wrote: > On Tue, 2010-10-12 at 11:38 -0700, Bart Van Assche wrote: >> Hello, >> >> Has anyone already tried to process the work completions generated by >> a HCA after the state of a queue pair has been changed to IB_QPS_ERR ? >> With the hardware/firmware/driver combination I have tested I have >> observed the following: >> * Multiple completions with the same wr_id and nonzero (error) status >> were received by the application, while all work requests queued with >> the flag IB_SEND_SIGNALED had a unique wr_id. >> * Completions with non-zero (error) status and a wr_id / opcode >> combination were received that were never queued by the application. >> Note: some work requests were queued with and some without the flag >> IB_SEND_SIGNALED. I'm not sure however whether that has anything to do >> with the observed behavior. >> >> This behavior is easy to reproduce. If I interpret the InfiniBand >> Architecture Specification correctly, this behavior is non-compliant. >> >> Has anyone been looking into this before ? > > I haven't seen it. It isn't supposed to happen. > > What hardware and software are you using and how do you > reproduce it?
Hello Ralph and Or, The way I reproduce that behavior is by modifying the state of a queue pair into IB_QPS_ERR while RDMA is ongoing. The application, which is multithreaded, performs RDMA by calling ib_post_recv() and ib_post_send() (opcodes IB_WR_SEND, IB_WR_RDMA_READ and IB_WR_RDMA_WRITE). This has been observed with the mlx4 driver, a ConnectX HCA and firmware version 2.7.0. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
