Hi all,

I have a simple program that test back to back RDMA read performance.
However I encountered errors for unknown reasons.

The basic flow of my program is:

client:
ibv_post_send() to send 4 back to back messages to server (no delay
inbetween). Each message contains the (rkey, addr, size) of a local
buffer. The buffer is registered with remote-read/write/ permissions.
After that, ibv_poll_cq() is called to wait for completion.

server:
First, enough receive WRs are posted to the RQ.  Upon receipt of each
message, immediately post a RDMA read request, using the (rkey, addr,
size) information contained in the originating message.

--------------
Both client and server use RC QP.  Some errors are observed.

On client side,  ibv_poll_cq() gets 4 CQE, one out of the 4 CQE is an error:
CQ::  wr_id=0x0, wc_opcode=IBV_WC_SEND, wc_status=remote invalid RD
request, wc_flag=0x3b
     byte_len=11338758, immdata=1110104528, qp_num=0x0, src_qp=2290530758

The other 3 CQE are success.

On server side,
3 of the 4 messages are successfully received. One message produces an
error CQE:
CQ::  wr_id=0x8000000000, wc_opcode=Unknow-wc-opcode,
wc_status=unknown, wc_flag=0x0
     byte_len=9569287, immdata=0, qp_num=0x0, src_qp=265551872

The 3 RDMA read corresponding to the successful receive all succeed.

But, if I pause the client program for a short while( usleep(100) for
example ) after calling ibv_post_send(), then no error occurs.
Anyone can point out the pitfall here? Thanks!


-----------
On both client and server, I'm using  'mthca0' type MT25208.  The QPs
are initialized with "qp_attr.max_dest_rd_atomic=4,
qp_attr.max_rd_atomic = 4".  The QP's "devinfo -v" gives the
information:

hca_id: mthca0
       fw_ver:                         5.1.400
       node_guid:                      0002:c902:0023:c04c
       sys_image_guid:                 0002:c902:0023:c04f
       vendor_id:                      0x02c9
       vendor_part_id:                 25218
       hw_ver:                         0xA0
       board_id:                       MT_0370130002
       phys_port_cnt:                  2
       max_mr_size:                    0xffffffffffffffff
       page_size_cap:                  0xfffff000
       max_qp:                         64512
       max_qp_wr:                      16384
       device_cap_flags:               0x00001c76
       max_sge:                        27
       max_sge_rd:                     0
       max_cq:                         65408
       max_cqe:                        131071
       max_mr:                         131056
       max_pd:                         32764
       max_qp_rd_atom:                 4
       max_ee_rd_atom:                 0
       max_res_rd_atom:                258048
       max_qp_init_rd_atom:            128
       max_ee_init_rd_atom:            0
       atomic_cap:                     ATOMIC_HCA (1)
       max_ee:                         0
       max_rdd:                        0
       max_mw:                         0
       max_raw_ipv6_qp:                0
       max_raw_ethy_qp:                0
       max_mcast_grp:                  8192
       max_mcast_qp_attach:            56
       max_total_mcast_qp_attach:      458752
       max_ah:                         0
       max_fmr:                        0
       max_srq:                        960
       max_srq_wr:                     16384
       max_srq_sge:                    27
       max_pkeys:                      64
       local_ca_ack_delay:             15
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to