Well, I said "PEER" process may destroyed the remote QP, The process calling ibv_poll_cq() still has the QP in RTS state.
And though I use OFED 1.3, the HCA is not connectX. idea ? --CQ > -----Original Message----- > From: Jack Morgenstein [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 26, 2008 2:01 AM > To: general@lists.openfabrics.org > Cc: Tang, Changqing; Roland Dreier > Subject: Re: [ofa-general] error with ibv_poll_cq() call > > On Wednesday 26 March 2008 05:56, Tang, Changqing wrote: > > > > Hi, > > We are debuging our dynamic process code, when we call > > > > ret = ibv_poll_cq(cq_hndl, 1, &compl); > > > > The peer process may have destroyed the QP. > > > > However, ibv_poll_cq() return -2 in 'ret', 'errno' is still 0 > > > > What could be the reason for this error ? > > > > There is a posted send pending for completion, so error should be > > reported via the completion status, not the polling function itself. > > > > Thanks for any help. This is OFED 1.3 > > Roland, > It looks like we have a race condition in mlx4_destroy_qp. > We clean the cq BEFORE modifying the QP to reset (done in > kernel as part of the ibv_cmd_destroy_qp() flow). > > CQ's problem has exposed this bug. mlx4_cq_clean needs to be invoked > **after** the destroy: > > Index: libmlx4/src/verbs.c > =================================================================== > --- libmlx4.orig/src/verbs.c 2008-03-26 09:00:08.000000000 +0200 > +++ libmlx4/src/verbs.c 2008-03-26 09:00:52.449586000 +0200 > @@ -558,11 +558,6 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) > struct mlx4_qp *qp = to_mqp(ibqp); > int ret; > > - mlx4_cq_clean(to_mcq(ibqp->recv_cq), ibqp->qp_num, > - ibqp->srq ? to_msrq(ibqp->srq) : NULL); > - if (ibqp->send_cq != ibqp->recv_cq) > - mlx4_cq_clean(to_mcq(ibqp->send_cq), > ibqp->qp_num, NULL); > - > mlx4_lock_cqs(ibqp); > mlx4_clear_qp(to_mctx(ibqp->context), ibqp->qp_num); > mlx4_unlock_cqs(ibqp); > @@ -576,6 +571,11 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp) > return ret; > } > > + mlx4_cq_clean(to_mcq(ibqp->recv_cq), ibqp->qp_num, > + ibqp->srq ? to_msrq(ibqp->srq) : NULL); > + if (ibqp->send_cq != ibqp->recv_cq) > + mlx4_cq_clean(to_mcq(ibqp->send_cq), ibqp->qp_num, > + NULL); > + > if (!ibqp->srq && ibqp->qp_type != IBV_QPT_XRC) > mlx4_free_db(to_mctx(ibqp->context), > MLX4_DB_TYPE_RQ, qp->db); > free(qp->sq.wrid); > > > > _______________________________________________ general mailing list general@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general