On Tue, Oct 12, 2010 at 08:58:59PM +0200, Bart Van Assche wrote:
> On Tue, Oct 12, 2010 at 8:50 PM, Ralph Campbell
> <[email protected]> wrote:
> > On Tue, 2010-10-12 at 11:38 -0700, Bart Van Assche wrote:
> >> Hello,
> >>
> >> Has anyone already tried to process the work completions generated by
> >> a HCA after the state of a queue pair has been changed to IB_QPS_ERR ?
> >> With the hardware/firmware/driver combination I have tested I have
> >> observed the following:
> >> * Multiple completions with the same wr_id and nonzero (error) status
> >> were received by the application, while all work requests queued with
> >> the flag IB_SEND_SIGNALED had a unique wr_id.
I assume your QP is configured for selective signalling, right? This
means that for succcessful processing of the work request there will 
not be  any completion. But for unsuccessful WR, the hardware should
generate a completion. For these casese it is worth having a 
meaningfull wrid.
> >> * Completions with non-zero (error) status and a wr_id / opcode
> >> combination were received that were never queued by the application.
In case of error the opcode of the completed operation is not provided.
I am not sure why.

> >> Note: some work requests were queued with and some without the flag
> >> IB_SEND_SIGNALED. I'm not sure however whether that has anything to do
> >> with the observed behavior.
If you have WRs for which you did not set IB_SEND_SIGNALED, they are
not considered completed before a comletion entry is pushed to the CQ
that correspnds to that send queue. I am not sure if it means that all
the WR in the send queue should be completed with error.
> >>
> >> This behavior is easy to reproduce. If I interpret the InfiniBand
> >> Architecture Specification correctly, this behavior is non-compliant.
> >>
> >> Has anyone been looking into this before ?
> >
> > I haven't seen it. It isn't supposed to happen.
> >
> > What hardware and software are you using and how do you
> > reproduce it?
> 
> Hello Ralph and Or,
> 
> The way I reproduce that behavior is by modifying the state of a queue
> pair into IB_QPS_ERR while RDMA is ongoing. The application, which is
> multithreaded, performs RDMA by calling ib_post_recv() and
> ib_post_send() (opcodes IB_WR_SEND, IB_WR_RDMA_READ and
> IB_WR_RDMA_WRITE). This has been observed with the mlx4 driver, a
> ConnectX HCA and firmware version 2.7.0.
> 
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to