[Lustre-discuss] lustre 1.6.0.1

Balagopal Pillai Thu, 21 Jun 2007 06:20:51 -0700

Hi,

I am using Lustre 1.6.0.1 with one OST and 20 clients in anHPC cluster.The OST/MDT/MGS has a 16 channel 3ware 9650 using raid6. I currentlyhave another lustre installation(version 1.4.5) and it has been working trouble free for over an year.The OS is CentOS 4. There are 4 networkports in the storage server in adaptive load balanced mode and aggregatenetwork throughout is great (with 4 x netperf/iperf from clients)in an ideal situation when clients pick up different mac addresses ofthe different interfaces in their arp table.


          I have a few questions about Lustre and hope someone can help me.

* I had to re-export the lustre volume via nfs on the new 1.6.0.1 setupto other infrastructure boxes.

After the export, i get the following error messages in the OSS -

Jun 21 09:31:11 lustre-3ware kernel: Lustre:4946:0:(lustre_fsfilt.h:205:fsfilt_start_log()) scratch-OST0000: slowjournal start 33sJun 21 09:31:11 lustre-3ware kernel: Lustre:4946:0:(lustre_fsfilt.h:205:fsfilt_start_log()) Skipped 22 previoussimilar messagesJun 21 09:31:11 lustre-3ware kernel: Lustre:4874:0:(filter.c:1139:filter_parent_lock()) scratch-OST0000: slow parentlock 33sJun 21 09:31:11 lustre-3ware kernel: Lustre:4874:0:(filter.c:1139:filter_parent_lock()) Skipped 6 previous similarmessages

Also is the NFS re-export option stable in version 1.6? I read someposts before in the list reporting kernel panics on Lustre 1.4.

*I was evaluating GFS for the past few weeks with GNBD and theperformance was amazing (at least for my purpose with one storageserver). It was very fast, especially for small files.But i had to dump it because of stability reasons. The problems werethese - has 6 daemons that need to come up in a particular order. Ifsome ofthe kernel modules crash on heavy load on a node, the whole clusterfreezes. It had the issue of quorum, which is beneficial on a HA setup,may be not for HPC.In some cases, i have to keep just one server running that re-exportsthe volume via nfs even if the hpc nodes are down. Like during a powerfailure for example. Quorum is aproblem in that case. But it was mostly stability that made me not gowith GFS + GNBD.

*Now the problem - Lustre performance dips a lot when it comes to smallfiles. Please see the following fileop -f 5 test comparing NFS and Lustre -



Lustre -

Fileop: File size is 1, Output is in Ops/sec. (A=Avg, B=Best, W=Worst). mkdir rmdir create read write close stat access chmodreaddir link unlink delete Total_filesA 5 1654 691 132 14228 719 4874 1987 32737 17182506 1262 1340 1608 125



NFS -
Fileop:  File size is 1,  Output is in Ops/sec. (A=Avg, B=Best, W=Worst)

. mkdir rmdir create read write close stat access chmodreaddir link unlink delete Total_filesA 5 177 594 459 380747 137392 2282 1219 444312 5021274 306 513 464 125

Could you please recommend any tunables to get a bit moreperformance out of Lustre with lots of small files? Lots of small fileswas bad in GFS too, but

it was better than NFS though.

*Also the read performance of Lustre seems to be a little behind NFS. Ihad /opt which has all the software for users moved to Lustre in the newsetup. Butsoftware like Matlab, Splus etc takes almost a minute to come up. Thesecond time is very fast though, maybe due to caching. So i am thinkingof putting /opt

back to NFS. Is it possible to boost the read performance of Lustre a bit?

*Is there a way to make disk quotas activate at startup automatically ona Lustre client? The lfs quotaon <mount point> works sometimes. Butsometimes it gives an a resource busy error message.*One last question. In the older Lustre setup (version 1.4.5), i have 5scsi drives one each as an OST for a single volume. The volume becamefull. But df still reportedthat there is 27GB free. There doesn't seem to be an lfs df option inthat version of Lustre. So i couldn't see the individual utilization ofeach of the 5 OST. Is this a striping

problem?

I know it's a lot of questions. Hope some of them aresolvable. Thanks very much.




Best Regards

Balagopal Pillai



_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

[Lustre-discuss] lustre 1.6.0.1

Reply via email to