> We have installed and are currently testing a central Infiniband lustre
> filesystem shared between two different clusters.  The lustre OSS and MDS
> are running on some nodes of a specific I/O server cluster.
>
> The infiniband fabric is a bit exotic as it interconnects several
> clusters with the Lustre cluster.  We want to optimize infiniband routes
> between the Lustre clients and the OSS nodes.  We saw that routes between
> 2 nodes generated by the subnet manager are different for each each
> direction.
>
> So we need to understand how lustre IO read/write requests between Lustre
> clients and OSS are "translated" in Infiniband requests.  What are the
> Infiniband low level protocols used by the driver ?  

The lustre/lnet IB drivers (o2iblnd is preferred) use RC queuepairs between
every pair of nodes.  Each QP is configured with 16 buffers for receiving
small (up to 4K) messages and a credit flow control protocol ensures that
we never send a message unless a buffer is posted to receive it.  Bulk data
(i.e. anything that can't fit into a "small" message) is sent via RDMA,
using message passing just to set up the RDMA and signal completion.  So a

> What kind of IB requests are issued when a Lustre client make a "READ" or
> WRITE operations have you some documentation available ?

Client -> Server:  lustre WRITE RPC request message
Client <- Server:  RDMA setup message
Client -> Server:  RDMA + completion message
Client <- Server:  lustre WRITE RPC reply message

Someone else might know if/where the lustre RPC is documented - I'm afraid
I don't have any documentation for the IB LNDs to offer you.

>
> thanks for your help
>
> Philippe Gregoire.
> 


_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to