On Wednesday 26 March 2008 05:56, Tang, Changqing wrote:
> 
> Hi,
>         We are debuging our dynamic process code, when we call
> 
> ret = ibv_poll_cq(cq_hndl, 1, &compl);
> 
> The peer process may have destroyed the QP.
> 
> However, ibv_poll_cq() return -2 in 'ret', 'errno' is still 0
> 
> What could be the reason for this error ?
> 
> There is a posted send pending for completion, so error should be
> reported via the completion status, not the polling function
> itself.
> 
> Thanks for any help. This is OFED 1.3

Roland,
It looks like we have a race condition in mlx4_destroy_qp.  We clean the
cq BEFORE modifying the QP to reset (done in kernel as part of
the ibv_cmd_destroy_qp() flow).

CQ's problem has exposed this bug.  mlx4_cq_clean needs to be invoked
**after** the destroy:

Index: libmlx4/src/verbs.c
===================================================================
--- libmlx4.orig/src/verbs.c    2008-03-26 09:00:08.000000000 +0200
+++ libmlx4/src/verbs.c 2008-03-26 09:00:52.449586000 +0200
@@ -558,11 +558,6 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp)
        struct mlx4_qp *qp = to_mqp(ibqp);
        int ret;
 
-       mlx4_cq_clean(to_mcq(ibqp->recv_cq), ibqp->qp_num,
-                      ibqp->srq ? to_msrq(ibqp->srq) : NULL);
-       if (ibqp->send_cq != ibqp->recv_cq)
-               mlx4_cq_clean(to_mcq(ibqp->send_cq), ibqp->qp_num, NULL);
-
        mlx4_lock_cqs(ibqp);
        mlx4_clear_qp(to_mctx(ibqp->context), ibqp->qp_num);
        mlx4_unlock_cqs(ibqp);
@@ -576,6 +571,11 @@ int mlx4_destroy_qp(struct ibv_qp *ibqp)
                return ret;
        }
 
+       mlx4_cq_clean(to_mcq(ibqp->recv_cq), ibqp->qp_num,
+                      ibqp->srq ? to_msrq(ibqp->srq) : NULL);
+       if (ibqp->send_cq != ibqp->recv_cq)
+               mlx4_cq_clean(to_mcq(ibqp->send_cq), ibqp->qp_num, NULL);
+
        if (!ibqp->srq && ibqp->qp_type != IBV_QPT_XRC)
                mlx4_free_db(to_mctx(ibqp->context), MLX4_DB_TYPE_RQ, qp->db);
        free(qp->sq.wrid);



_______________________________________________
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to