[ofa-general] ibv_get_cq_event blocking forever after successful ibv_post_send...

Jimmy Hill Fri, 25 May 2007 14:22:25 -0700

I have verbs code that is modeled after the first usage model described on the 
ibv_get_cq_event() man page. That is, I have created all the verbs resources 
(e.g., completion channel, QP, CQ, etc.) and then followed the sequence of:


ibv_req_notify_cq(cq, 0);

ibv_post_send(qp, &work_req, &bad_work_req);

ibv_get_cq_event(channel, &ev_cq, &ev_ctx);

ibv_ack_cq_events(ev_cq, 1);

ibv_req_notify_cq(cq, 0);

ibv_poll_cq(cq, 1, &wc);  // loop to drain - but due to upper protocol, will 
only ever be 1 at a time


The QP is created with the following attributes:
    qp_init_attr.qp_context              = &this->conn_ref;
    qp_init_attr.send_cq                  = this->send_cq;
    qp_init_attr.recv_cq                   = this->recv_cq;
    qp_init_attr.srq                          = NULL;
    qp_init_attr.cap.max_send_wr     = 128;
    qp_init_attr.cap.max_recv_wr      = 4; 
    qp_init_attr.cap.max_send_sge    = 16; 
    qp_init_attr.cap.max_recv_sge     = 4; 
    qp_init_attr.cap.max_inline_data   = 0;
    qp_init_attr.qp_type                   = IBV_QPT_RC;
    qp_init_attr.sq_sig_all                  = 0;
// I have also used sq_sig_all set to 1 and then removed the SIGNALED flag in 
the send request

The Send request (RDMA Write) is formatted as:
    sge.lkey     = response_mr->lkey;
    sge.addr    = response;
    sge.length = 256;

    send_work_req.opcode                     = IBV_WR_RDMA_WRITE;
    send_work_req.next                         = NULL;
    send_work_req.sg_list                       = &sge;
    send_work_req.num_sge                   = 1;
    send_work_req.wr_id                        = 0;
    send_work_req.imm_data                  = 0;
    send_work_req.wr.rdma.remote_addr = client_rmr->addr;
    send_work_req.wr.rdma.rkey             = client_rmr->rkey;
    send_work_req.send_flags                 = IBV_SEND_SIGNALED; 
// I have used IBV_SEND_SIGNALED and IBV_SEND_SIGNALED | IBV_SEND_FENCE

This QP will be used to RDMA Write a response back to a client. With the 
current setup, only one RDMA write will be outstanding per QP at any given 
time. That is, I issue the RDMA Write and wait for its completion prior to 
continuing processing. The eventual goal is to request and process a completion 
event every "n" RDMA Writes.

The current problem is that everything runs along fine and then I end up in a 
situation where I block forever on the ibv_get_cq_event() call. The 
ibv_post_send() just prior to the ibv_get_cq_event() call returned "0" 
indicating that it successfully processed the command. However, the completion 
event for that operation never arrives. The data associated with that RDMA 
write does not appear on the client side, so it seems that even though the 
ibv_post_send() reported success, it really did not successfully process the 
request.

In order to debug the problem, I changed the completion channel to non-blocking 
and put the ibv_get_cq_event() call in a loop and dumped out the number of 
passes through the loop (i.e., number of calls to ibv_get_cq_event()) prior to 
the arrival of an event (good status from the call). When all is working fine, 
it only takes one or two calls for the event to arrive. When I encounter the 
situation where it blocked forever, it loops forever calling 
ibv_get_cq_event(). I added a counter there and after a large (e.g., 500) 
number of retries, I looped back up and tried the ibv_post_send() again. For 
the most part, the request makes it out the second time. But, given enough 
time, the send queue work requests entries are consumed. That is, if I lower 
the max_send_wr attribute to 10, after 10 failed event collection attempts and 
ibv_post_send() retries, the 11th ibv_post_send() will fail with -1 status 
code. So, the work request entries are not leaving the send queue.

Any ideas on why the ibv_get_cq_event() would never see an event after a 
"successful" send requesting a completion event?

thanks,
jimmy

_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] ibv_get_cq_event blocking forever after successful ibv_post_send...

Reply via email to