What exactly were you testing? I have no idea how to interpret your numbers. A single client reading from a single file? One file per OST, or file striped across all OSTs? Is the Lustre file system idle except for your test?
In general, start with the pieces: 1) make sure the network is sane. Try measuring BW to/from each node (client and server) to ensure all the cables are good. For your configuration, you should be able to measure ~3.2GB/s (unidirectional) using large MPI messages. While I prefer to use MPI, some people use the lnet_selftest. 2) make sure each OST is sane. For each OST, create a file that is only striped on that OST. Make sure a client can read/write each of these files as expected. Be sure you transfer much more data than the client+server RAM sizes. Many issues are sorted out just getting both 1 & 2 in good shape. Kevin Tanin wrote: > Dear all, > > I have two question regarding the performance of Lustre System. > Currently, we have 5 OSS nodes, and each OSS carries 8 OST's. All the > nodes (including the MDT/MGS node and client node) are connected to a > Mellanox MTS 3600 InfiniBand switch using RDMA for data transfer. The > bandwidth of the network is 40Gbps. The kernel version is 'Linux > 2.6.18-164.11.1.el5_lustre.1.8.3 #1 SMP Fri Apr 9 18:00:39 MDT 2010 > x86_64 x86_64 x86_64 GNU/Linux'. OS is RHEL 5.5. Lustre version is > 1.8.3. OFED Version is 1.5.2. IB HCA is Mellanox Technologies MT26428 > ConnectX VPI PCIe IB QDR. > > And I did a simple test on the client side to see the peak data > reading performance. Here is the data: > > #time Data transferred Bandwidth > 2 sec 2.18 GBytes 8.71 Gbits/sec > 2 sec 2.06 GBytes 8.24 Gbits/sec > 2 sec 2.10 GBytes 8.40 Gbits/sec > 2 sec 1.93 GBytes 7.73 Gbits/sec > 2 sec 1.50 GBytes 6.02 Gbits/sec > 2 sec 420.00 MBytes 1.64 Gbits/sec > 2 sec 2.19 GBytes 8.75 Gbits/sec > 2 sec 2.08 GBytes 8.32 Gbits/sec > 2 sec 2.08 GBytes 8.32 Gbits/sec > 2 sec 1.99 GBytes 7.97 Gbits/sec > 2 sec 1.80 GBytes 7.19 Gbits/sec > *2 sec 160.00 MBytes 640.00 Mbits/sec* > 2 sec 2.15 GBytes 8.59 Gbits/sec > 2 sec 2.13 GBytes 8.52 Gbits/sec > 2 sec 2.15 GBytes 8.59 Gbits/sec > 2 sec 2.09 GBytes 8.36 Gbits/sec > 2 sec 2.09 GBytes 8.36 Gbits/sec > 2 sec 2.07 GBytes 8.28 Gbits/sec > 2 sec 2.15 GBytes 8.59 Gbits/sec > 2 sec 2.11 GBytes 8.44 Gbits/sec > 2 sec 2.05 GBytes 8.20 Gbits/sec > *2 sec 0.00 Bytes 0.00 bits/sec* > *2 sec 0.00 Bytes 0.00 bits/sec* > 2 sec 1.95 GBytes 7.81 Gbits/sec > 2 sec 2.14 GBytes 8.55 Gbits/sec > 2 sec 1.99 GBytes 7.97 Gbits/sec > 2 sec 2.00 GBytes 8.01 Gbits/sec > 2 sec 370.00 MBytes 1.45 Gbits/sec > 2 sec 1.96 GBytes 7.85 Gbits/sec > 2 sec 2.03 GBytes 8.12 Gbits/sec > 2 sec 1.89 GBytes 7.58 Gbits/sec > 2 sec 1.94 GBytes 7.77 Gbits/sec > 2 sec 640.00 MBytes 2.50 Gbits/sec > 2 sec 1.47 GBytes 5.90 Gbits/sec > 2 sec 1.94 GBytes 7.77 Gbits/sec > 2 sec 1.90 GBytes 7.62 Gbits/sec > 2 sec 1.94 GBytes 7.77 Gbits/sec > 2 sec 1.18 GBytes 4.73 Gbits/sec > 2 sec 940.00 MBytes 3.67 Gbits/sec > 2 sec 1.97 GBytes 7.89 Gbits/sec > 2 sec 1.93 GBytes 7.73 Gbits/sec > 2 sec 1.87 GBytes 7.46 Gbits/sec > 2 sec 1.77 GBytes 7.07 Gbits/sec > 2 sec 320.00 MBytes 1.25 Gbits/sec > 2 sec 1.97 GBytes 7.89 Gbits/sec > 2 sec 2.00 GBytes 8.01 Gbits/sec > 2 sec 1.89 GBytes 7.58 Gbits/sec > 2 sec 1.93 GBytes 7.73 Gbits/sec > 2 sec 350.00 MBytes 1.37 Gbits/sec > 2 sec 1.77 GBytes 7.07 Gbits/sec > 2 sec 1.92 GBytes 7.70 Gbits/sec > 2 sec 2.05 GBytes 8.20 Gbits/sec > 2 sec 2.01 GBytes 8.05 Gbits/sec > 2 sec 710.00 MBytes 2.77 Gbits/sec > 2 sec 1.59 GBytes 6.37 Gbits/sec > 2 sec 2.00 GBytes 8.01 Gbits/sec > 2 sec 710.00 MBytes 2.77 Gbits/sec > 2 sec 1.59 GBytes 6.37 Gbits/sec > 2 sec 2.00 GBytes 8.01 Gbits/sec > 2 sec 1.88 GBytes 7.54 Gbits/sec > 2 sec 1.62 GBytes 6.48 Gbits/sec > > > As you can see, although the peak bandwidth can reach 8.71Gbps, the > performance is quite unstable(sometimes the bandwidth just gets > chocked). All the OSS node seems to stop reading data simultaneously. > I tried to group up different OSTs and turn on/off the checksum, this > still happens. Does anybody get a hint of the reason? > > 2. As we know, when reading data from lustre client, the data is moved > from OSS disk to its memory, and then send to the lustre client. > Except for O_DIRECT, is there any other configuration to optimize the > disk data access, such as using sendfile, splice or fio, which can > greatly expedite the disk data access? > > fio: http://freshmeat.net/projects/fio/ > > Any help will be greatly appreciated. Thanks! > > > > -- > Best regards, > > ----------------------------------------------------------------------------------------------- > Li, Tan > PhD Candidate & Research Assistant, > Electrical Engineering, > Stony Brook University, NY > > Personal Web Site: https://sites.google.com/site/homepagelitan/Home > > Email: [email protected] <mailto:[email protected]> > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
