Or would it be better to increase the stripe count for my lustre filesystem to the max number of OST's?
On Wed, Mar 3, 2010 at 3:27 PM, Jagga Soorma <[email protected]> wrote: > On Wed, Mar 3, 2010 at 2:30 PM, Andreas Dilger <[email protected]> wrote: > >> On 2010-03-03, at 12:50, Jagga Soorma wrote: >> >>> I have just deployed a new Lustre FS with 2 MDS servers, 2 active OSS >>> servers (5x2TB OST's per OSS) and 16 compute nodes. >>> >> >> Does this mean you are using 5 2TB disks in a single RAID-5 OST per OSS >> (i.e. total OST size is 8TB), or are you using 5 separate 2TB OSTs? > > > No I am using 5 independent 2TB OST's per OSS. > > >> >> >> Attached are our findings from the iozone tests and it looks like the >>> iozone throughput tests have demonstrated almost linear scalability of >>> Lustre except for when WRITING files that exceed 128MB in size. When >>> multiple clients create/write files larger than 128MB, Lustre throughput >>> levels up to approximately ~1GB/s. This behavior has been observed with >>> almost all tested block size ranges except for 4KB. I don't have any >>> explanation as to why Lustre performs poorly when writing large files. >>> >>> Has anyoned experienced this behaviour? Any comments on our findings? >>> >> >> >> The default client tunable max_dirty_mb=32MB per OSC (i.e. the maximum >> amount of unwritten dirty data per OST before blocking the process >> submitting IO). If you have 2 OST/OSCs and you have a stripe count of 2 >> then you can cache up to 64MB on the client without having to wait for any >> RPCs to complete. That is why you see a performance cliff for writes beyond >> 32MB. >> > > So the true write performance should be measured for data captured for > files larger than 128MB? If we do see a large number of large files being > created on the lustre fs, is this something that can be tuned on the client > side? If so, where/how can I get this done and what would be the > recommended settings? > > >> It should be clear that the read graphs are meaningless, due to local >> cache of the file. I'd hazard a guess that you are not getting 100GB/s from >> 2 OSS nodes. >> > > Agreed. Is there a way to find out the size of the local cache on the > clients? > > >> >> Also, what is the interconnect on the client? If you are using a single >> 10GigE then 1GB/s is as fast as you can possibly write large files to the >> OSTs, regardless of the striping. >> > > I am using Infiniband (QDR) interconnects for all nodes. > > >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >> >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
