Hi,
I have an issue with ibv_post_send.
Here is what I am doing ....
I have 3 work requests (chained with the next field – last next set to
NULL). Each work request has one sg_list entry.
the wr_id’s are 0, 1 and 2.
All opcodes are IBV_WR_RDMA_WRITE.
All send_flags = IBV_SEND_INLINE | IBV_SEND_SIGNALED;
Mellanox NIC = InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe
2.0 5GT/s - IB QDR / 10GigE] (rev b0)
(HCA Firmware version: 2.8.600)
I am using RDMA CM
The OS is Linux ...... 2.6.18-164.el5 #1 SMP Fri Sep 17 12:37:48 PDT 2010
x86_64 x86_64 x86_64 GNU/Linux
After all the initializations (memory registrations, etc ......) my process
executes its first ibv_post_send.
it returns –1.
I look at the first element of bad_wr updated by the call. It shows a work
id (wr_id field) that is totally unrelated (anything but 0, 1 or 2).
I print errno – The value is zero.
Infiniband software I am using is MLNX OFED 1.5.1
(If I send only one work request in ibv_post_send - it works.)
BTW - The same code has worked on a Intel NetEffect card before on a Ubuntu
10.04 server OS – no issues.
How do I go about debugging this problem? I have put in additional details
below.
Thanks,
Manoj
Additional details
code snips
start ----------------------------------------------------------------------------------------------
int rc;
struct ibv_send_wr bad_send_wr[3];
.
.
.
bad_send_wr[0].wr_id = bad_send_wr[1].wr_id = bad_send_wr[2].wr_id = 0; //
no other init or memset of bad_send_wr array
errno = 0;
rc = ibv_post_send(rdmaptr->cm_id->qp, &rdmaptr->wr[0], &bad_send_wr);
if (rc){
printf("There is something wrong in this RDMA write case rc = %d, wr_id
= %d, errno = %d\n", rc, bad_send_wr[0].wr_id, errno);
exit(1);
.
code snips
end -------------------------------------------------------------------------------------------------
gdb snip start (breakpoint on
printf)-----------------------------------------------------------------------
(gdb) p rc
$17 = <value optimized out> #my comment - printf prints rc as -1
(gdb) p rdmaptr->wr[0]
$18 = {wr_id = 0, next = 0x7fff6703bfe0, sg_list = 0x7fff6703c080,
num_sge = 1, opcode = IBV_WR_RDMA_WRITE, send_flags = 10, imm_data = 0,
wr = {rdma = {remote_addr = 47878802383600, rkey = 3892586125}, atomic = {
remote_addr = 47878802383600, compare_add = 3892586125, swap = 0,
rkey = 0}, ud = {ah = 0x2b8ba70222f0, remote_qpn = 3892586125,
remote_qkey = 0}}, xrc_remote_srq_num = 0}
(gdb) p rdmaptr->wr[1]
$19 = {wr_id = 1, next = 0x7fff6703c030, sg_list = 0x7fff6703c090,
num_sge = 1, opcode = IBV_WR_RDMA_WRITE, send_flags = 10, imm_data = 0,
wr = {rdma = {remote_addr = 47878802319112, rkey = 3892586125}, atomic = {
remote_addr = 47878802319112, compare_add = 3892586125, swap = 0,
rkey = 0}, ud = {ah = 0x2b8ba7012708, remote_qpn = 3892586125,
remote_qkey = 0}}, xrc_remote_srq_num = 0}
(gdb) p rdmaptr->wr[2]
$20 = {wr_id = 2, next = 0x0, sg_list = 0x7fff6703c0a0, num_sge = 1,
opcode = IBV_WR_RDMA_WRITE, send_flags = 10, imm_data = 0, wr = {rdma = {
remote_addr = 47878802319088, rkey = 3892586125}, atomic = {
remote_addr = 47878802319088, compare_add = 3892586125, swap = 0,
rkey = 0}, ud = {ah = 0x2b8ba70126f0, remote_qpn = 3892586125,
remote_qkey = 0}}, xrc_remote_srq_num = 0}
(gdb) p &rdmaptr->wr[1]
$21 = (struct ibv_send_wr *) 0x7fff6703bfe0 # my comment - just to show
that this is next in list to rdmaptr->wr[0]
(gdb) p &rdmaptr->wr[1]
$22 = (struct ibv_send_wr *) 0x7fff6703bfe0 # my comment - just to show
that this is next in list to rdmaptr->wr[1]
(gdb) p &rdmaptr->wr[2]
$23 = (struct ibv_send_wr *) 0x7fff6703c030
(gdb) p rdmaptr->wr[0].sg_list[0]
$24 = {addr = 47966518199024, length = 512, lkey = 2684626508}
(gdb) p rdmaptr->wr[1].sg_list[0]
$25 = {addr = 47966518134536, length = 8, lkey = 2684626508}
(gdb) p rdmaptr->wr[2].sg_list[0]
$26 = {addr = 47966518134512, length = 8, lkey = 2684626508}
(gdb) p errno
$27 = 0
(gdb) p bad_send_wr[0]
$28 = {wr_id = 140734921686928, next = 0x2ba013203fc8,
sg_list = 0x7fff6703bd10, num_sge = 1728298176, opcode = 32767,
send_flags = -338177694, imm_data = 0, wr = {rdma = {
remote_addr = 140734921686232, rkey = 0}, atomic = {
remote_addr = 140734921686232, compare_add = 0, swap = 212479283730,
rkey = 0}, ud = {ah = 0x7fff6703bcd8, remote_qpn = 0, remote_qkey =
0}},
xrc_remote_srq_num = 0}
(gdb) p bad_send_wr[1]
$29 = {wr_id = 0, next = 0x2ba013203c68, sg_list = 0x1, num_sge = -1,
opcode = IBV_WR_RDMA_WRITE, send_flags = 1, imm_data = 0, wr = {rdma = {
remote_addr = 2184432, rkey = 1728298720}, atomic = {
remote_addr = 2184432, compare_add = 140734921686752,
swap = 47966518132736, rkey = 323070272}, ud = {ah = 0x2154f0,
remote_qpn = 1728298720, remote_qkey = 32767}},
xrc_remote_srq_num = 194951584}
(gdb) p bad_send_wr[2]
$30 = {wr_id = 0, next = 0x3178c0cd15, sg_list = 0x2ba000000001, num_sge =
0,
opcode = IBV_WR_RDMA_WRITE, send_flags = 320876544, imm_data = 11168, wr =
{
rdma = {remote_addr = 4195498, rkey = 320888632}, atomic = {
remote_addr = 4195498, compare_add = 47966515650360, swap = 10005,
rkey = 1728298512}, ud = {ah = 0x4004aa, remote_qpn = 320888632,
remote_qkey = 11168}}, xrc_remote_srq_num = 2025924754}
gdb snip end(breakpoint on
printf)-----------------------------------------------------------------------
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html