Currently the ibv_post_send()/ibv_post_recv() path through kernel
(using /dev/infiniband/rdmacm) could be optimized by removing dynamic memory
allocations on the path.
Currently the transmit/receive path works following way:
User calls ibv_post_send() where vendor specific function is called.
When the path should go through kernel the ibv_cmd_post_send() is called.
The function creates the POST_SEND message body that is passed to kernel.
As the number of sges is unknown the dynamic allocation for message body is
performed.
(see libibverbs/src/cmd.c)
In the kernel the message body is parsed and a structure of wr and sges is
recreated using dynamic allocations in kernel
The goal of this operation is having a similar structure like in user space.
The proposed path optimization is removing of dynamic allocations
by redefining a structure definition passed to kernel.
>From
struct ibv_post_send {
__u32 command;
__u16 in_words;
__u16 out_words;
__u64 response;
__u32 qp_handle;
__u32 wr_count;
__u32 sge_count;
__u32 wqe_size;
struct ibv_kern_send_wr send_wr[0];
};
To
struct ibv_post_send {
__u32 command;
__u16 in_words;
__u16 out_words;
__u64 response;
__u32 qp_handle;
__u32 wr_count;
__u32 sge_count;
__u32 wqe_size;
struct ibv_kern_send_wr send_wr[512];
};
Similar change is required in kernel struct ib_uverbs_post_send defined in
/ofa_kernel/include/rdma/ib_uverbs.h
This change limits a number of send_wr passed from unlimited (assured by
dynamic allocation) to reasonable number of 512.
I think this number should be a max number of QP entries available to send.
As the all iB/iWARP applications are low latency applications so the number of
WRs passed are never unlimited.
As the result instead of dynamic allocation the ibv_cmd_post_send() fills the
proposed structure
directly and passes it to kernel. Whenever the number of send_wr number exceeds
the limit the ENOMEM error is returned.
In kernel in ib_uverbs_post_send() instead of dynamic allocation of the
ib_send_wr structures
the table of 512 ib_send_wr structures will be defined and
all entries will be linked to unidirectional list so qp->device->post_send(qp,
wr, &bad_wr) API will be not changed.
As I know no driver uses that kernel path to posting buffers so iWARP multicast
acceleration implemented in NES driver
Would be a first application that can utilize the optimized path.
Regards,
Mirek
Signed-off-by: Mirek Walukiewicz <[email protected]>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html