On 2010-08-22, at 11:58, burlen wrote: Andreas Dilger wrote: >> Currently, 1MB is the largest bulk IO size, and is the typical size used by >> clients for all IO. > > Is my understanding correct? > > A single RPC request will initiate an RDMA transfer of at most > "max_pages_per_rpc". where the page unit is Lustre page size 65536. Each RDMA > transfer is executed in 1MB chunks. On a given client, if there are more > than "max_pages_per_rpc" pages of data available to transfer , multiple RPCs > are issued and multiple RDMA's are initiated.
No, the max_pages_per_rpc is scaled down proportionately for systems with large PAGE_SIZE. This is because the node doesn't know what the PAGE_SIZE of the peer is. There is a patch in bugzilla that does what you propose - submit larger IO request RPCs, and do multiple 1MB RDMA xfers per request. However, this showed performance _loss_ in some cases (in particular shared-file IO), and the reason for this regression was never diagnosed. > Would it be correct to say: The purpose of the "max_pages_per_rpc" parameter > is to enable the servers to even out the individual progress of concurrent > clients with a lot of data to move and more fairly apportion the available > bandwidth amongst concurrently writing clients? Yes, partly. The more important factor is max_rpcs_in_flight, which limits the number of requests that a client can submit to each server at one time. There was a research paper written to have dynamic max_rpcs_in_flight that showed performance improvements when there are few clients active, and we'd like to include that code into Lustre when it is ready. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
