On 2009-12-02, at 12:15, Craig Tierney wrote: > Andreas Dilger wrote: >> On 2009-12-02, at 09:20, Francois Chassaing wrote: >>> I have a big fundamental question : >>> if the load that I'll put on the FS is more IOPS-intensive than >>> throughput-intensive (because I'll access lots of medium-sized files >>> ~5 MB from a small number of clients), should I better go Lustre or >>> PVFS2 ? >> >> I don't think PVFS2 is necessarily better at IOPS than Lustre. This >> is mostly dependent upon the storage configuration. >> >>> Also, if the main load is IOPS, shouldn't I oversize MDS/MDT in >>> terms of CPU/RAM and storage perf (ie. : max of 15K SAS RAID10 >>> spindles possible) ? >> >> The Lustre MDS/MDT is used only at file lookup/open/close, but is not >> involved during actual IO operations. Still, this means in your case >> that the MDS is getting 2 RPCs (open + close, which can be done >> asynchronously in memory) for every 5 OST RPCs (5MB read/write, which >> happen synchronously), so the MDS will definitely need to scale but >> not necessarily at 2/5 of the total OST size. >> >> Typical numbers for a high-end MDT node (16-core, 64GB of RAM, DDR >> IB) >> is about 8-10k creates/sec, up to 20k lookups/sec from many clients. >> >> Depending on the number of files you are planning to have in the >> filesystem, I would suggest SSDs for the MDT filesystem, especially >> if >> you have a large working set and are doing read-mostly access. > > Has anyone reported results of an SSD based MDT?
We have done internal testing, and the performance for many workloads is somewhat faster, but not a TON faster. This is because Lustre is already doing async IO on the MDS, unlike NFS, so decent streaming IO performance and lots of RAM meet many of the create/lookup performance targets. If you have a huge filesystem that is doing a lot of random lookup, create, and unlink operations (i.e. the working set is larger than the MDS RAM, about 4kB per file for random operations, 16M files on a 64GB MDS) then the high IOPS rate of SSDs will definitely make a huge difference (i.e. keeping 20k lookups/sec on DDR instead of falling to mdt_disks * 100). Since that isn't a common workload for our customers, we haven't done a lot of testing in that area, but it is definitely something I'm curious about. >>> on the budget side, may I use asynchronous DRBD to mirror MDT >>> (internal storage), or should I only got a good shared storage >>> (direct or iscsi) ? >> >> Some people on this list have used DRBD, but we haven't tested it >> ourselves. I _suspect_ (though have not necessarily tested this) >> that >> if you are using DRBD it would be possible to have lower-performance >> storage on the backup server without significantly impacting the >> primary server performance, if you are willing to run slower in the >> rare case when you are failed-over to the backup. >> >>> Today I'm leaning towards Lustre, because I've tested it against >>> glusterfs, and gluster performed little less good than lustre but >>> poorly failed the bonnie++ create/delete tests. Also I didn't gave a >>> shot at PVFS2 yet... >> >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > > -- > Craig Tierney ([email protected]) > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
