I got a basic Lustre cluster up and running and did two experiments: 1- using GbE as the interconnect 2- using QDR IB as interconnect
Here are the simple performance results I collected using the pointers from the Lustre user guide: [root@22-82 ~]# ost-survey /mnt/lustre01 /usr/bin/ost-survey: 06/29/11 OST speed survey on /mnt/lustre01 from 172.21.22.82@tcp Number of Active OST devices : 8 Worst Read OST indx: 1 speed: 57.704295 Best Read OST indx: 3 speed: 61.655312 Read Average: 59.785920 +/- 1.245626 MB/s Worst Write OST indx: 3 speed: 28.564328 Best Write OST indx: 5 speed: 70.976016 Write Average: 55.497721 +/- 13.457404 MB/s Ost# Read(MB/s) Write(MB/s) Read-time Write-time ---------------------------------------------------- 0 59.931 65.722 0.501 0.456 1 57.704 42.436 0.520 0.707 2 60.056 66.074 0.500 0.454 3 61.655 28.564 0.487 1.050 4 58.096 62.979 0.516 0.476 5 59.935 70.976 0.501 0.423 6 61.053 57.724 0.491 0.520 7 59.856 49.507 0.501 0.606 [root@22-82_ib ~]# ost-survey /mnt/lustre01 /usr/bin/ost-survey: 07/14/11 OST speed survey on /mnt/lustre01 from 10.1.3.82@o2ib Number of Active OST devices : 8 Worst Read OST indx: 0 speed: 180.625987 Best Read OST indx: 6 speed: 214.961331 Read Average: 200.478485 +/- 11.408814 MB/s Worst Write OST indx: 0 speed: 291.709350 Best Write OST indx: 6 speed: 496.616135 Write Average: 397.025375 +/- 59.815286 MB/s Ost# Read(MB/s) Write(MB/s) Read-time Write-time ---------------------------------------------------- 0 180.626 291.709 0.166 0.103 1 206.211 396.815 0.145 0.076 2 207.928 356.645 0.144 0.084 3 197.543 384.335 0.152 0.078 4 206.908 403.361 0.145 0.074 5 205.670 470.235 0.146 0.064 6 214.961 496.616 0.140 0.060 7 183.981 376.487 0.163 0.080 Are these results any good? To me it looks very disappointing as we can get 3GB/s from the RAID controller aggregating a collection of raw SAS drives on the OSTs, and we should be able to get a peak of -5GB/s from QDR IB. First question: is this baseline reasonable? Second question: what are the tools I can use to better understand the Lustre FS behavior to characterize the performance I am getting on the client side? I did check the IB network and I did not record any IB network errors during these runs. So I am confident that the IB network was working properly. Looking forward to better understanding Lustre performance. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
