> We have installed and are currently testing a central Infiniband lustre > filesystem shared between two different clusters. The lustre OSS and MDS > are running on some nodes of a specific I/O server cluster. > > The infiniband fabric is a bit exotic as it interconnects several > clusters with the Lustre cluster. We want to optimize infiniband routes > between the Lustre clients and the OSS nodes. We saw that routes between > 2 nodes generated by the subnet manager are different for each each > direction. > > So we need to understand how lustre IO read/write requests between Lustre > clients and OSS are "translated" in Infiniband requests. What are the > Infiniband low level protocols used by the driver ?
The lustre/lnet IB drivers (o2iblnd is preferred) use RC queuepairs between every pair of nodes. Each QP is configured with 16 buffers for receiving small (up to 4K) messages and a credit flow control protocol ensures that we never send a message unless a buffer is posted to receive it. Bulk data (i.e. anything that can't fit into a "small" message) is sent via RDMA, using message passing just to set up the RDMA and signal completion. So a > What kind of IB requests are issued when a Lustre client make a "READ" or > WRITE operations have you some documentation available ? Client -> Server: lustre WRITE RPC request message Client <- Server: RDMA setup message Client -> Server: RDMA + completion message Client <- Server: lustre WRITE RPC reply message Someone else might know if/where the lustre RPC is documented - I'm afraid I don't have any documentation for the IB LNDs to offer you. > > thanks for your help > > Philippe Gregoire. > _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
