Copilot commented on code in PR #3261:
URL: https://github.com/apache/brpc/pull/3261#discussion_r3061832765
##########
src/brpc/rdma/rdma_endpoint.cpp:
##########
@@ -1384,6 +1400,24 @@ void RdmaEndpoint::DeallocateResources() {
}
if (move_to_rdma_resource_list) {
+ // When a QP is moved to the RESET state, all associated send and
+ // receive queues are flushed, meaning any outstanding WRs are
effectively
+ // abandoned by the hardware.
+ //
+ // However, the CQ associated with that QP is *not* cleared
automatically,
+ // meaning that it will still contain entries for WRs that completed
before
+ // the reset.
+ //
+ // The application should finish polling the CQ to remove these
obsolete
+ // entries before reusing the QP.
+ int ret = DrainCq(_resource->polling_cq);
+ ret += DrainCq(_resource->send_cq);
+ ret += DrainCq(_resource->recv_cq);
+ if (ret < 0) {
+ move_to_rdma_resource_list = false;
+ goto _reclaim;
Review Comment:
This change introduces new reset/reuse behavior (draining CQ(s) before
re-adding a prepared QP, and falling back to reclaim-on-failure), but there
doesn’t appear to be any existing test coverage for the prepared-QP reuse path.
Since there is a substantial RDMA test suite (`test/brpc_rdma_unittest.cpp`)
but no tests referencing `rdma_prepared_qp_*`, consider adding a targeted test
that exercises a reset while CQEs are pending and asserts the endpoint doesn’t
process stale completions after reset.
##########
src/brpc/rdma/rdma_endpoint.cpp:
##########
@@ -1315,6 +1315,21 @@ static void DeallocateCq(ibv_cq* cq) {
LOG_IF(WARNING, 0 != err) << "Fail to destroy CQ: " << berror(err);
}
+static int DrainCq(ibv_cq* cq) {
+ if (NULL == cq) {
+ return 0;
+ }
+
+ ibv_wc wc;
+ int ret;
+ do {
+ ret = ibv_poll_cq(cq, 1, &wc);
+ } while (ret > 0);
+
+ LOG_IF(ERROR, ret < 0) << "drain CQ failed: " << ret;
Review Comment:
`DrainCq()` logs the raw return value from `ibv_poll_cq()` on failure, but
`ibv_poll_cq()` returns a negative errno-style value. Logging just `ret` makes
debugging harder; consider logging the decoded error (e.g.,
`berror(-ret)`/`strerror(-ret)`) and/or include the CQ pointer so it’s clear
which CQ failed to drain.
##########
src/brpc/rdma/rdma_endpoint.cpp:
##########
@@ -1384,6 +1400,24 @@ void RdmaEndpoint::DeallocateResources() {
}
if (move_to_rdma_resource_list) {
+ // When a QP is moved to the RESET state, all associated send and
+ // receive queues are flushed, meaning any outstanding WRs are
effectively
+ // abandoned by the hardware.
+ //
+ // However, the CQ associated with that QP is *not* cleared
automatically,
+ // meaning that it will still contain entries for WRs that completed
before
+ // the reset.
+ //
+ // The application should finish polling the CQ to remove these
obsolete
+ // entries before reusing the QP.
+ int ret = DrainCq(_resource->polling_cq);
+ ret += DrainCq(_resource->send_cq);
+ ret += DrainCq(_resource->recv_cq);
+ if (ret < 0) {
+ move_to_rdma_resource_list = false;
+ goto _reclaim;
+ }
Review Comment:
The drain failure handling here is hard to follow and loses detail: summing
the three `DrainCq()` return codes makes it impossible to tell which CQ failed,
and introducing a `goto` for cleanup adds an uncommon control-flow pattern in
this file. Consider checking each drain result separately (with CQ name in the
log) and restructuring cleanup with an `if`/`else` or an early-return to avoid
the `goto`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]