Jimmy Hill wrote:
I have verbs code that is modeled after the first usage model described on the 
ibv_get_cq_event() man page. That is, I have created all the verbs resources 
(e.g., completion channel, QP, CQ, etc.) and then followed the sequence of:

ibv_req_notify_cq(cq, 0);

ibv_post_send(qp, &work_req, &bad_work_req);

ibv_get_cq_event(channel, &ev_cq, &ev_ctx);

ibv_ack_cq_events(ev_cq, 1);

ibv_req_notify_cq(cq, 0);

ibv_poll_cq(cq, 1, &wc);  // loop to drain - but due to upper protocol, will 
only ever be 1 at a time


The QP is created with the following attributes:
    qp_init_attr.qp_context              = &this->conn_ref;
    qp_init_attr.send_cq                  = this->send_cq;
    qp_init_attr.recv_cq                   = this->recv_cq;
    qp_init_attr.srq                          = NULL;
    qp_init_attr.cap.max_send_wr     = 128;
qp_init_attr.cap.max_recv_wr = 4; qp_init_attr.cap.max_send_sge = 16; qp_init_attr.cap.max_recv_sge = 4; qp_init_attr.cap.max_inline_data = 0;
    qp_init_attr.qp_type                   = IBV_QPT_RC;
    qp_init_attr.sq_sig_all                  = 0;
// I have also used sq_sig_all set to 1 and then removed the SIGNALED flag in 
the send request

The Send request (RDMA Write) is formatted as:
    sge.lkey     = response_mr->lkey;
    sge.addr    = response;
    sge.length = 256;

    send_work_req.opcode                     = IBV_WR_RDMA_WRITE;
    send_work_req.next                         = NULL;
    send_work_req.sg_list                       = &sge;
    send_work_req.num_sge                   = 1;
    send_work_req.wr_id                        = 0;
    send_work_req.imm_data                  = 0;
    send_work_req.wr.rdma.remote_addr = client_rmr->addr;
    send_work_req.wr.rdma.rkey             = client_rmr->rkey;
send_work_req.send_flags = IBV_SEND_SIGNALED; // I have used IBV_SEND_SIGNALED and IBV_SEND_SIGNALED | IBV_SEND_FENCE

This QP will be used to RDMA Write a response back to a client. With the current setup, 
only one RDMA write will be outstanding per QP at any given time. That is, I issue the 
RDMA Write and wait for its completion prior to continuing processing. The eventual goal 
is to request and process a completion event every "n" RDMA Writes.

The current problem is that everything runs along fine and then I end up in a situation 
where I block forever on the ibv_get_cq_event() call. The ibv_post_send() just prior to 
the ibv_get_cq_event() call returned "0" indicating that it successfully 
processed the command. However, the completion event for that operation never arrives. 
The data associated with that RDMA write does not appear on the client side, so it seems 
that even though the ibv_post_send() reported success, it really did not successfully 
process the request.

In order to debug the problem, I changed the completion channel to non-blocking 
and put the ibv_get_cq_event() call in a loop and dumped out the number of 
passes through the loop (i.e., number of calls to ibv_get_cq_event()) prior to 
the arrival of an event (good status from the call). When all is working fine, 
it only takes one or two calls for the event to arrive. When I encounter the 
situation where it blocked forever, it loops forever calling 
ibv_get_cq_event(). I added a counter there and after a large (e.g., 500) 
number of retries, I looped back up and tried the ibv_post_send() again. For 
the most part, the request makes it out the second time. But, given enough 
time, the send queue work requests entries are consumed. That is, if I lower 
the max_send_wr attribute to 10, after 10 failed event collection attempts and 
ibv_post_send() retries, the 11th ibv_post_send() will fail with -1 status 
code. So, the work request entries are not leaving the send queue.

Any ideas on why the ibv_get_cq_event() would never see an event after a "successful" send requesting a completion
Try to do the following scenario:


ibv_req_notify_cq(cq, 0);

ibv_post_send(qp, &work_req, &bad_work_req);

ibv_get_cq_event(channel, &ev_cq, &ev_ctx);

ibv_ack_cq_events(ev_cq, 1);

ibv_req_notify_cq(cq, 0);

in a loop until the CQ is empty:
        ibv_poll_cq(cq, 1, &wc);  // loop to drain - but due to upper protocol, 
will only ever be 1 at a time



Dotan

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to