On 11/21/2013 01:08 PM, James wrote:
On Wed, 2013-11-20 at 18:30 +0530, Lalatendu Mohanty wrote:
On 11/12/2013 05:54 AM, James wrote:
Hi there,

This is a hypothetical problem, not one that describes specific hardware
at the moment.

As we all know, gluster currently usually works best when each brick is
the same size, and each host has the same number of bricks. Let's call
this a "homogeneous" configuration.

Suppose you buy the hardware to build such a pool. Two years go by, and
you want to grow the pool. Changes in drive size, hardware, cpu, etc
will be such that it won't be possible (or sensible) to buy the same
exact hardware, sized drives, etc... A heterogeneous pool is
unavoidable.

Is there a general case solution for this problem? Is something planned
to deal with this problem? I can only think of a few specific corner
case solutions.
I am not sure about of issues you are expecting when a heterogeneous
configuration is used. As gluster is intelligent enough for handling
sub-volumes/bricks with different sizes.  So I think heterogeneous
configuration should not be a issue for gluster. Let us know what are
the corner cases you have in mind (may be this will give me some
pointers to think :)).
I am thinking about performance differences, due to an imbalance of data
stored on type A hosts, versus type B hosts. I am also thinking about
performance simply due to older versus newer hardware. Even at the
interconnect level there could be significant differences (eg: Gigabit
vs. 10gE, etc...)

I'm not entirely sure how well Gluster can keep the data proportionally
balanced (eg: each brick has 60% or 70% free space, independent of
actual Gb stored) if there is a significant enough difference in the
size of the bricks. Any idea?


The dynamic hashing algorithm automatically works well to keep data fairly distributed. But it will not 100% of the cases as the hash value depends on the file name. However Gluster can create data on another brick if the one brick is full. User can decide (through a volume set command) at what % data usage, Gluster should consider it is full. There is a nice blog from Jeff about it.

http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/
Another problem that comes to mind is ensuring that the older slower
servers don't act as bottlenecks to the whole pool
I think this is unavoidable but the time-line for these kind of change
will be around 10 to 15 years. However we can replace bricks if the old
servers really slows the whole thing down.
Well I think it's particularly elegant that Gluster works on commodity
hardware, but it would be ideal if it worked with heterogeneous hardware
in a much more robust way. The ideas jdarcy had mentioned seem like they
might solve these problems in a nice way, but afaik they're just ideas
and not code yet.

Agree!, storage tiering is awesome idea. Like you mentioned it also solves the performance issue in a heterogeneous setup.
. jdarcy had mentioned
that gluster might gain some notion of tiering, to support things like
ssd's in one part of the volume, and slow drives at the other end. Maybe
this sort of architecture can be used to solve the same problems.

Thoughts and discussion welcome.

Cheers,
James



_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to