We are about to abandon GlusterFS as a solution for our object storage needs.  
I'm hoping to get some feedback to tell me whether we have missed something and 
are making the wrong decision.  We're already a year into this project after 
evaluating a number of solutions.  I'd like not to abandon GlusterFS if we just 
misunderstand how it works.

Our use case is fairly straight forward.  We need to save a bunch of somewhat 
large files (1MB-100MB).  For the most part, these files are write once, read 
several times.  Our initial store is 80TB, but we expect to go to roughly 320TB 
fairly quickly.  After that, we expect to be adding another 80TB every few 
months.  We are using some COTS servers which we add in pairs; each server has 
40TB of usable storage.  We intend to keep two copies of each file.  We 
currently run 4TB bricks

In our somewhat limited test environment, GlusterFS seemed to work well.  And, 
our initial introduction of GlusterFS into our production environment went 
well.  We had our initial 2 server (80TB) cluster about 50% full and things 
seemed to be going well.

Then we added another pair of servers (for a total of 160TB).  This went fine 
until we did the rebalance.  We were running 3.3.1.  We ran into the handle 
leak problem (which unfortunately we didn't know about beforehand).  We also 
found that if any of the bricks went offline while the rebalance was going on, 
then files were lost or they lost their permissions.  We still don't know why 
some of the bricks went offline, but they did and we have verified in our test 
environment that this is sufficient to cause the corruption problem.

The good news is that we think both of these problems got fixed in 3.4.1.  So 
why are we leaving?

In trying to figure out what was going on with our GlusterFS system after the 
disastrous rebalance, we ran across two posts.  The first one was 
http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/.  If we 
understand it correctly, anytime you add new storage servers to your cluster, 
you have to do a rebalance and that rebalance will require a minimum of 50% of 
the data in the cluster to be moved to make the hashing algorithms work.  This 
means that when we have a 320TB cluster and add another 80TB, we have to move 
at least 160TB just to get things back into balance.  Our estimate is that that 
will take months.  It probably won't finish before we need to add another 80TB.

The other post we ran across was 
http://www.gluster.org/community/documentation/index.php/Planning34/ElasticBrick.
  This post seems to confirm our understanding of the rebalance.  It appears to 
be a discussion of the rebalance problem and a possible solution.  It was 
apparently discussed for 3.4, but didn't make the cut.

I'd be happy to find out that we just got it wrong.  Tell me that rebalancing 
doesn't work the way we think.  Or maybe we should configure things different 
or something.

My problem is that if GlusterFS isn't good for starting with a small cluster 
(80TB) and growing over time to half a petabyte, what is the use case it is 
intended for?  Do you really have to start out with the amount of storage you 
think you'll need in the long-run and just fill it up as you go?  That's why 
I'm nervous about our understanding of the rebalance.  It's hard to believe it 
works this way (at least from our perspective).

We have a lot of man hours into writing code and putting infrastructure in for 
GlusterFS.  We can likely reuse much of it for another system.  I would just 
like to know that we really do understand the rebalance and that it really 
works the way I described it before we start evaluating other object store 
solutions.

Comments?

Scott


_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to