{RFC] ibv_post_send()/ibv_post_recv() kernel path optimizations

Walukiewicz, Miroslaw Fri, 06 Aug 2010 03:08:54 -0700

Currently the ibv_post_send()/ibv_post_recv() path through kernel 
(using /dev/infiniband/rdmacm) could be optimized by removing dynamic memory 
allocations on the path.


Currently the transmit/receive path works following way:
User calls ibv_post_send() where vendor specific function is called. 
When the path should go through kernel the ibv_cmd_post_send() is called.
 The function creates the POST_SEND message body that is passed to kernel. 
As the number of sges is unknown the dynamic allocation for message body is 
performed. 
(see libibverbs/src/cmd.c)

In the kernel the message body is parsed and a structure of wr and sges is 
recreated using dynamic allocations in kernel 
The goal of this operation is having a similar structure like in user space. 

The proposed path optimization is removing of dynamic allocations 
by redefining a structure definition passed to kernel. 
>From 

struct ibv_post_send {
        __u32 command;
        __u16 in_words;
        __u16 out_words;
        __u64 response;
        __u32 qp_handle;
        __u32 wr_count;
        __u32 sge_count;
        __u32 wqe_size;
        struct ibv_kern_send_wr send_wr[0];
};
To 

struct ibv_post_send {
        __u32 command;
        __u16 in_words;
        __u16 out_words;
        __u64 response;
        __u32 qp_handle;
        __u32 wr_count;
        __u32 sge_count;
        __u32 wqe_size;
        struct ibv_kern_send_wr send_wr[512];
};

Similar change is required in kernel  struct ib_uverbs_post_send defined in 
/ofa_kernel/include/rdma/ib_uverbs.h

This change limits a number of send_wr passed from unlimited (assured by 
dynamic allocation) to reasonable number of 512. 
I think this number should be a max number of QP entries available to send. 
As the all iB/iWARP applications are low latency applications so the number of 
WRs passed are never unlimited.

As the result instead of dynamic allocation the ibv_cmd_post_send() fills the 
proposed structure 
directly and passes it to kernel. Whenever the number of send_wr number exceeds 
the limit the ENOMEM error is returned.

In kernel  in ib_uverbs_post_send() instead of dynamic allocation of the 
ib_send_wr structures 
the table of 512  ib_send_wr structures  will be defined and 
all entries will be linked to unidirectional list so qp->device->post_send(qp, 
wr, &bad_wr) API will be not changed. 

As I know no driver uses that kernel path to posting buffers so iWARP multicast 
acceleration implemented in NES driver 
Would be a first application that can utilize the optimized path. 

Regards,

Mirek

Signed-off-by: Mirek Walukiewicz <[email protected]>


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

{RFC] ibv_post_send()/ibv_post_recv() kernel path optimizations

Reply via email to