Thanks Richard, I appreciate your advice. I was able to sature the channel using: XDD, 10 threads writing in 10 OST, each OST in difference OSS and this is the result:
ETHERNET T Q Bytes Ops Time Rate IOPS Latency %CPU TARGET Average 0 1 2147483648 65536 140.156 15.322 467.59 0.0021 39.16 TARGET Average 1 1 2147483648 65536 140.785 15.254 465.50 0.0021 39.11 TARGET Average 2 1 2147483648 65536 140.559 15.278 466.25 0.0021 39.14 TARGET Average 3 1 2147483648 65536 176.141 12.192 372.07 0.0027 38.02 TARGET Average 4 1 2147483648 65536 168.234 12.765 389.55 0.0026 38.54 TARGET Average 5 1 2147483648 65536 140.823 15.250 465.38 0.0021 39.11 TARGET Average 6 1 2147483648 65536 140.183 15.319 467.50 0.0021 39.16 TARGET Average 8 1 2147483648 65536 176.432 12.172 371.45 0.0027 38.02 TARGET Average 9 1 2147483648 65536 167.944 12.787 390.23 0.0026 38.57 Combined 10 10 21474836480 655360 180.000 119.305 3640.89 0.0003 387.99 INFINIBAND T Q Bytes Ops Time Rate IOPS Latency %CPU TARGET Average 0 1 2147483648 65536 9.369 229.217 6995.16 0.0001 480.40 TARGET Average 1 1 2147483648 65536 9.540 225.110 6869.80 0.0001 474.25 TARGET Average 2 1 2147483648 65536 8.963 239.582 7311.45 0.0001 479.85 TARGET Average 3 1 2147483648 65536 9.480 226.521 6912.86 0.0001 478.21 TARGET Average 4 1 2147483648 65536 9.109 235.748 7194.47 0.0001 480.83 TARGET Average 5 1 2147483648 65536 9.284 231.299 7058.69 0.0001 479.04 TARGET Average 6 1 2147483648 65536 8.839 242.947 7414.15 0.0001 480.55 TARGET Average 7 1 2147483648 65536 9.210 233.166 7115.65 0.0001 480.17 TARGET Average 8 1 2147483648 65536 9.373 229.125 6992.33 0.0001 475.13 TARGET Average 9 1 2147483648 65536 9.184 233.828 7135.86 0.0001 480.25 Combined 10 10 21474836480 655360 9.540 2251.097 68698.03 0.0000 4788.69 A estimate is 0,6Gbits (max 1Gbit) by ethernet and 16Gbits by infiniband (max 40Gbits). REGARDS! El 19/05/2014, a las 17:37, Mohr Jr, Richard Frank (Rick Mohr) <rm...@utk.edu> escribió: > Alfonso, > > Based on my attempts to benchmark single client Lustre performance, here is > some advice/comments that I have. (YMMV) > > 1) On the IB client, I recommend disabling checksums (lctl set_param > osc.*.checksums=0). Having checksums enabled sometimes results in a > significant performance hit. > > 2) Single-threaded tests (like dd) will usually bottleneck before you can max > out the total client performance. You need to use a multi-threaded tool > (like xdd) and have several threads perform IO at the same time in order to > measure aggregate single client performance. > > 3) When using a tool like xdd, set up the test to run for a fixed amount of > time rather than having each thread write a fixed amount of data. If all > threads write a fixed amount of data (say 1 GB), and if any of the threads > run slower than others, you might get skewed results for the aggregate > throughput because of the stragglers. > > 4) In order to avoid contention at the ost level among the multiple threads > on a single client, precreate the output files with stripe_count=1 and > statically assign them evenly to the different osts. Have each thread write > to a different file so that no two processes write to the same ost. If you > don't have enough osts to saturate the client, you can always have two files > per ost. Going beyond that will likely hurt more than help, at least for an > ldiskfs backend. > > 5) In my testing, I seem to get worse results using direct I/O for write > tests, so I usually just use buffered I/O. Based on my understanding, the > max_dirty_mb parameter on the client (which defaults to 32 MB) limits the > amount of dirty written data than can be cached on each ost. Unless you have > increased this to a very large number, that parameter will likely mitigate > any effects of client caching on the test results. (NOTE: This reasoning > only applies to write tests. Any written data can still be cached by the > client, and a subsequent read test might very well pull data from cache > unless you have taken steps to flush the cached data.) > > If you have 10 oss nodes and 20 osts in your file system, I would start by > running a test with 10 threads and have each thread write to a single ost on > different servers. You can increase/decrease the number of threads as needed > to see if the aggregate performance gets better/worse. On my clients with > QDR IB, I typically see aggregate write speeds in the range of 2.5-3.0 GB/s. > > You are probably already aware of this, but just in case, make sure that the > IB clients you use for testing don't also have ethernet connections to your > OSS servers. If the client has an ethernet and an IB path to the same > server, it will choose one of the paths to use. It could end up choosing > ethernet instead of IB and mess up your results. > > -- > Rick Mohr > Senior HPC System Administrator > National Institute for Computational Sciences > http://www.nics.tennessee.edu > > > On May 19, 2014, at 6:33 AM, "Pardo Diaz, Alfonso" <alfonso.pa...@ciemat.es> > wrote: > >> Hi, >> >> I have migrated my Lustre 2.2 to 2.5.1 and I have equipped my OSS/MDS and >> clients with Infiniband QDR interfaces. >> I have compile lustre with OFED 3.2 and I have configured lnet module with: >> >> options lent networks=“o2ib(ib0),tcp(eth0)” >> >> >> But when I try to compare the lustre performance across Infiniband (o2ib), I >> get the same performance than across ethernet (tcp): >> >> INFINIBAND TEST: >> dd if=/dev/zero of=test.dat bs=1M count=1000 >> 1000+0 records in >> 1000+0 records out >> 1048576000 bytes (1,0 GB) copied, 5,88433 s, 178 MB/s >> >> ETHERNET TEST: >> dd if=/dev/zero of=test.dat bs=1M count=1000 >> 1000+0 records in >> 1000+0 records out >> 1048576000 bytes (1,0 GB) copied, 5,97423 s, 154 MB/s >> >> >> And this is my scenario: >> >> - 1 MDs with SSD RAID10 MDT >> - 10 OSS with 2 OST per OSS >> - Infiniband interface in connected mode >> - Centos 6.5 >> - Lustre 2.5.1 >> - Striped filesystem “lfs setstripe -s 1M -c 10" >> >> >> I know my infiniband running correctly, because if I use IPERF3 between >> client and servers I got 40Gb/s by infiniband and 1Gb/s by ethernet >> connections. >> >> >> >> Could you help me? >> >> >> Regards, >> >> >> >> >> >> Alfonso Pardo Diaz >> System Administrator / Researcher >> c/ Sola nº 1; 10200 Trujillo, ESPAÑA >> Tel: +34 927 65 93 17 Fax: +34 927 32 32 37 >> >> >> >> >> ---------------------------- >> Confidencialidad: >> Este mensaje y sus ficheros adjuntos se dirige exclusivamente a su >> destinatario y puede contener información privilegiada o confidencial. Si no >> es vd. el destinatario indicado, queda notificado de que la utilización, >> divulgación y/o copia sin autorización está prohibida en virtud de la >> legislación vigente. Si ha recibido este mensaje por error, le rogamos que >> nos lo comunique inmediatamente respondiendo al mensaje y proceda a su >> destrucción. >> >> Disclaimer: >> This message and its attached files is intended exclusively for its >> recipients and may contain confidential information. If you received this >> e-mail in error you are hereby notified that any dissemination, copy or >> disclosure of this communication is strictly prohibited and may be unlawful. >> In this case, please notify us by a reply and delete this email and its >> contents immediately. >> ---------------------------- >> >> _______________________________________________ >> HPDD-discuss mailing list >> hpdd-disc...@lists.01.org >> https://lists.01.org/mailman/listinfo/hpdd-discuss > > > _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss