At 11:02 PM 2/20/2009, Jordan Mendler wrote:
I am prototyping GlusterFS with ~50-60TB of raw disk space across
non-raided disks in ~30 compute nodes. I initially separated the
nodes into groups of two, and did a replicate across each set of
single drives in a pair of servers. Next I did a stripe across the
33 resulting AFR groups, with a block size of 1MB and later with the
default block size. With these configurations I am only seeing
throughput of about 15-25 MB/s, despite a full Gig-E network.
What is generally the recommended configuration in a large striped
environment? I am wondering if the number of nodes in the stripe is
causing too much overhead, or if the bottleneck is likely somewhere
else. In addition, I saw a thread on the list that indicates it is
better to replicate across stripes rather than stripe across
replicates. Does anyone have any comments or opinion regarding this?
I think that's all guesswork, I'm not sure anyones done a thorough
test with gluster 2.0 on those choices.
Personally, from a data management perspective, I'd rather replicate
then stripe, so that I know that each node in a replica has exactly
the same data. With striping then replicating, I imagine there is
the possibility to have some data that's on one node in one stripe
set on 2 nodes in another stripe set and this causes a problem if you
have to take it apart or deal with it later.
However, if you have the time, it'd be great to see results of you
testing with a 15 node stripe and a 10 node stripe to see how those
numbers rate vs. the 30 node stripe you have now.
then, flip the replication and do the same tests again.
Keith
_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users