Hi all, I have a simple program that test back to back RDMA read performance. However I encountered errors for unknown reasons.
The basic flow of my program is: client: ibv_post_send() to send 4 back to back messages to server (no delay inbetween). Each message contains the (rkey, addr, size) of a local buffer. The buffer is registered with remote-read/write/ permissions. After that, ibv_poll_cq() is called to wait for completion. server: First, enough receive WRs are posted to the RQ. Upon receipt of each message, immediately post a RDMA read request, using the (rkey, addr, size) information contained in the originating message. -------------- Both client and server use RC QP. Some errors are observed. On client side, ibv_poll_cq() gets 4 CQE, one out of the 4 CQE is an error: CQ:: wr_id=0x0, wc_opcode=IBV_WC_SEND, wc_status=remote invalid RD request, wc_flag=0x3b byte_len=11338758, immdata=1110104528, qp_num=0x0, src_qp=2290530758 The other 3 CQE are success. On server side, 3 of the 4 messages are successfully received. One message produces an error CQE: CQ:: wr_id=0x8000000000, wc_opcode=Unknow-wc-opcode, wc_status=unknown, wc_flag=0x0 byte_len=9569287, immdata=0, qp_num=0x0, src_qp=265551872 The 3 RDMA read corresponding to the successful receive all succeed. But, if I pause the client program for a short while( usleep(100) for example ) after calling ibv_post_send(), then no error occurs. Anyone can point out the pitfall here? Thanks! ----------- On both client and server, I'm using 'mthca0' type MT25208. The QPs are initialized with "qp_attr.max_dest_rd_atomic=4, qp_attr.max_rd_atomic = 4". The QP's "devinfo -v" gives the information: hca_id: mthca0 fw_ver: 5.1.400 node_guid: 0002:c902:0023:c04c sys_image_guid: 0002:c902:0023:c04f vendor_id: 0x02c9 vendor_part_id: 25218 hw_ver: 0xA0 board_id: MT_0370130002 phys_port_cnt: 2 max_mr_size: 0xffffffffffffffff page_size_cap: 0xfffff000 max_qp: 64512 max_qp_wr: 16384 device_cap_flags: 0x00001c76 max_sge: 27 max_sge_rd: 0 max_cq: 65408 max_cqe: 131071 max_mr: 131056 max_pd: 32764 max_qp_rd_atom: 4 max_ee_rd_atom: 0 max_res_rd_atom: 258048 max_qp_init_rd_atom: 128 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 0 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 0 max_mcast_grp: 8192 max_mcast_qp_attach: 56 max_total_mcast_qp_attach: 458752 max_ah: 0 max_fmr: 0 max_srq: 960 max_srq_wr: 16384 max_srq_sge: 27 max_pkeys: 64 local_ca_ack_delay: 15 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
