At 11:02 PM 2/20/2009, Jordan Mendler wrote:
I am prototyping GlusterFS with ~50-60TB of raw disk space across non-raided disks in ~30 compute nodes. I initially separated the nodes into groups of two, and did a replicate across each set of single drives in a pair of servers. Next I did a stripe across the 33 resulting AFR groups, with a block size of 1MB and later with the default block size. With these configurations I am only seeing throughput of about 15-25 MB/s, despite a full Gig-E network.

What is generally the recommended configuration in a large striped environment? I am wondering if the number of nodes in the stripe is causing too much overhead, or if the bottleneck is likely somewhere else. In addition, I saw a thread on the list that indicates it is better to replicate across stripes rather than stripe across replicates. Does anyone have any comments or opinion regarding this?

I think that's all guesswork, I'm not sure anyones done a thorough test with gluster 2.0 on those choices. Personally, from a data management perspective, I'd rather replicate then stripe, so that I know that each node in a replica has exactly the same data. With striping then replicating, I imagine there is the possibility to have some data that's on one node in one stripe set on 2 nodes in another stripe set and this causes a problem if you have to take it apart or deal with it later.

However, if you have the time, it'd be great to see results of you testing with a 15 node stripe and a 10 node stripe to see how those numbers rate vs. the 30 node stripe you have now.
then, flip the replication and do the same tests again.

Keith



_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Reply via email to