On 2010-03-03, at 12:50, Jagga Soorma wrote: > I have just deployed a new Lustre FS with 2 MDS servers, 2 active > OSS servers (5x2TB OST's per OSS) and 16 compute nodes.
Does this mean you are using 5 2TB disks in a single RAID-5 OST per OSS (i.e. total OST size is 8TB), or are you using 5 separate 2TB OSTs? > Attached are our findings from the iozone tests and it looks like > the iozone throughput tests have demonstrated almost linear > scalability of Lustre except for when WRITING files that exceed > 128MB in size. When multiple clients create/write files larger than > 128MB, Lustre throughput levels up to approximately ~1GB/s. This > behavior has been observed with almost all tested block size ranges > except for 4KB. I don't have any explanation as to why Lustre > performs poorly when writing large files. > > Has anyoned experienced this behaviour? Any comments on our findings? The default client tunable max_dirty_mb=32MB per OSC (i.e. the maximum amount of unwritten dirty data per OST before blocking the process submitting IO). If you have 2 OST/OSCs and you have a stripe count of 2 then you can cache up to 64MB on the client without having to wait for any RPCs to complete. That is why you see a performance cliff for writes beyond 32MB. It should be clear that the read graphs are meaningless, due to local cache of the file. I'd hazard a guess that you are not getting 100GB/ s from 2 OSS nodes. Also, what is the interconnect on the client? If you are using a single 10GigE then 1GB/s is as fast as you can possibly write large files to the OSTs, regardless of the striping. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
