I have verbs code that is modeled after the first usage model described on the
ibv_get_cq_event() man page. That is, I have created all the verbs resources
(e.g., completion channel, QP, CQ, etc.) and then followed the sequence of:
ibv_req_notify_cq(cq, 0);
ibv_post_send(qp, &work_req, &bad_work_req);
ibv_get_cq_event(channel, &ev_cq, &ev_ctx);
ibv_ack_cq_events(ev_cq, 1);
ibv_req_notify_cq(cq, 0);
ibv_poll_cq(cq, 1, &wc); // loop to drain - but due to upper protocol, will
only ever be 1 at a time
The QP is created with the following attributes:
qp_init_attr.qp_context = &this->conn_ref;
qp_init_attr.send_cq = this->send_cq;
qp_init_attr.recv_cq = this->recv_cq;
qp_init_attr.srq = NULL;
qp_init_attr.cap.max_send_wr = 128;
qp_init_attr.cap.max_recv_wr = 4;
qp_init_attr.cap.max_send_sge = 16;
qp_init_attr.cap.max_recv_sge = 4;
qp_init_attr.cap.max_inline_data = 0;
qp_init_attr.qp_type = IBV_QPT_RC;
qp_init_attr.sq_sig_all = 0;
// I have also used sq_sig_all set to 1 and then removed the SIGNALED flag in
the send request
The Send request (RDMA Write) is formatted as:
sge.lkey = response_mr->lkey;
sge.addr = response;
sge.length = 256;
send_work_req.opcode = IBV_WR_RDMA_WRITE;
send_work_req.next = NULL;
send_work_req.sg_list = &sge;
send_work_req.num_sge = 1;
send_work_req.wr_id = 0;
send_work_req.imm_data = 0;
send_work_req.wr.rdma.remote_addr = client_rmr->addr;
send_work_req.wr.rdma.rkey = client_rmr->rkey;
send_work_req.send_flags = IBV_SEND_SIGNALED;
// I have used IBV_SEND_SIGNALED and IBV_SEND_SIGNALED | IBV_SEND_FENCE
This QP will be used to RDMA Write a response back to a client. With the
current setup, only one RDMA write will be outstanding per QP at any given
time. That is, I issue the RDMA Write and wait for its completion prior to
continuing processing. The eventual goal is to request and process a completion
event every "n" RDMA Writes.
The current problem is that everything runs along fine and then I end up in a
situation where I block forever on the ibv_get_cq_event() call. The
ibv_post_send() just prior to the ibv_get_cq_event() call returned "0"
indicating that it successfully processed the command. However, the completion
event for that operation never arrives. The data associated with that RDMA
write does not appear on the client side, so it seems that even though the
ibv_post_send() reported success, it really did not successfully process the
request.
In order to debug the problem, I changed the completion channel to non-blocking
and put the ibv_get_cq_event() call in a loop and dumped out the number of
passes through the loop (i.e., number of calls to ibv_get_cq_event()) prior to
the arrival of an event (good status from the call). When all is working fine,
it only takes one or two calls for the event to arrive. When I encounter the
situation where it blocked forever, it loops forever calling
ibv_get_cq_event(). I added a counter there and after a large (e.g., 500)
number of retries, I looped back up and tried the ibv_post_send() again. For
the most part, the request makes it out the second time. But, given enough
time, the send queue work requests entries are consumed. That is, if I lower
the max_send_wr attribute to 10, after 10 failed event collection attempts and
ibv_post_send() retries, the 11th ibv_post_send() will fail with -1 status
code. So, the work request entries are not leaving the send queue.
Any ideas on why the ibv_get_cq_event() would never see an event after a
"successful" send requesting a completion event?
thanks,
jimmy
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general