I am trying to figure out how efficient MR registration followed by an RDMA write is.
For that matter I am running the following loop:
// create MR of size 64KB
for (i = 0; i < max_writes; i++) {
// destroy old MR
// create MR of size 64KB
// RDMA write from new MR to some remote buffer
}
At some point (varying) I get the following error:
iwch_ev_dispatch - CQE Err qpid 0x3d00 opcode 0 status 0x1 type 1 wrid.hi 0xb3 wrid.lo 0x0
post_qp_event - AE qpid 0x3d00 opcode 0 status 0x1 type 1 wrid.hi 0xb3 wrid.lo 0x0
...which basically tells me that the egress (type 1) RDMA write (opcode 0) has failed du to an invaild STag
(status 0x1 = STAG invalid: either the STAG is offlimit, being 0 or STAG_key mismatch).
The error occurs at ibv_post_send().
Here is a trace of the WRs posted shortly before the 'crash':
wr_id=178
loc_addr=0x2aaaab64f010
loc_len=65536
lkey=4552191
num_sge=1
rem_addr=0x2aaaab5d0010
rkey=1459967
wr_id=179
loc_addr=0x2aaaab65f010
loc_len=65536
lkey=4555263
num_sge=1
rem_addr=0x2aaaab5e0010
rkey=1459967
ASYNC_EVENT: [QP] Local access violation error
wr_id=180
loc_addr=0x2aaaab66f010
loc_len=65536
lkey=4555519
num_sge=1
rem_addr=0x2aaaab5f0010
rkey=1459967
ERROR: [rdma_write] failed to post rdma write wr
ERROR: rdma write (180/1000) failed
Do you have any idea what could be happening here? I noticed that if I do signaled writes and wait for each
individual completion, this does not happen. It is also not an issue when posting RDMA writes of size
When using 64KB or larger this happens... but why? I assume that as soon as ibv_reg_mr() returns I am free
to use the MR, right?
Many thanks for your advice,
Phil
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
