Re: [Gluster-users] Is Gluster the wrong solution for us?

Franco Broi Wed, 11 Dec 2013 21:10:25 -0800

If you are only adding disk space and don't necessarily need to increase
bandwidth, then you wont need to rebalance. It's only a problem if you
are adding clients and your most frequently accessed files are all on
the same brick.


On Thu, 2013-12-12 at 04:49 +0000, Scott Smith wrote: 
> Pretty much, our files are never deleted.  We just keep adding more 
> information.  Think of them as write once, read multiple, delete never.
> 
> -----Original Message-----
> From: Franco Broi [mailto:[email protected]] 
> Sent: Wednesday, December 11, 2013 7:31 PM
> To: Scott Smith
> Cc: [email protected]
> Subject: Re: [Gluster-users] Is Gluster the wrong solution for us?
> 
> 
> How long-lived are your files? We have 400TB and are just about to double 
> that but have decided not to rebalance the data, instead we are hoping that 
> the disks will rebalance naturally through attrition and not waste any 
> valuable time or bandwidth moving data around.
> 
> On Thu, 2013-12-12 at 01:15 +0000, Scott Smith wrote: 
> > We are about to abandon GlusterFS as a solution for our object storage 
> > needs.  I’m hoping to get some feedback to tell me whether we have 
> > missed something and are making the wrong decision.  We’re already a 
> > year into this project after evaluating a number of solutions.  I’d 
> > like not to abandon GlusterFS if we just misunderstand how it works.
> > 
> >  
> > 
> > Our use case is fairly straight forward.  We need to save a bunch of 
> > somewhat large files (1MB-100MB).  For the most part, these files are 
> > write once, read several times.  Our initial store is 80TB, but we 
> > expect to go to roughly 320TB fairly quickly.  After that, we expect 
> > to be adding another 80TB every few months.  We are using some COTS 
> > servers which we add in pairs; each server has 40TB of usable storage.
> > We intend to keep two copies of each file.  We currently run 4TB 
> > bricks
> > 
> >  
> > 
> > In our somewhat limited test environment, GlusterFS seemed to work 
> > well.  And, our initial introduction of GlusterFS into our production 
> > environment went well.  We had our initial 2 server (80TB) cluster 
> > about 50% full and things seemed to be going well.
> > 
> >  
> > 
> > Then we added another pair of servers (for a total of 160TB).  This 
> > went fine until we did the rebalance.  We were running 3.3.1.  We ran 
> > into the handle leak problem (which unfortunately we didn’t know about 
> > beforehand).  We also found that if any of the bricks went offline 
> > while the rebalance was going on, then files were lost or they lost 
> > their permissions.  We still don’t know why some of the bricks went 
> > offline, but they did and we have verified in our test environment 
> > that this is sufficient to cause the corruption problem.
> > 
> >  
> > 
> > The good news is that we think both of these problems got fixed in 
> > 3.4.1.  So why are we leaving?
> > 
> >  
> > 
> > In trying to figure out what was going on with our GlusterFS system 
> > after the disastrous rebalance, we ran across two posts.  The first 
> > one was 
> > http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/.  If 
> > we understand it correctly, anytime you add new storage servers to your 
> > cluster, you have to do a rebalance and that rebalance will require a 
> > minimum of 50% of the data in the cluster to be moved to make the hashing 
> > algorithms work.  This means that when we have a 320TB cluster and add 
> > another 80TB, we have to move at least 160TB just to get things back into 
> > balance.  Our estimate is that that will take months.  It probably won’t 
> > finish before we need to add another 80TB.
> > 
> >  
> > 
> > The other post we ran across was
> > http://www.gluster.org/community/documentation/index.php/Planning34/ElasticBrick.
> >   This post seems to confirm our understanding of the rebalance.  It 
> > appears to be a discussion of the rebalance problem and a possible 
> > solution.  It was apparently discussed for 3.4, but didn’t make the cut.  
> > 
> >  
> > 
> > I’d be happy to find out that we just got it wrong.  Tell me that 
> > rebalancing doesn’t work the way we think.  Or maybe we should 
> > configure things different or something.
> > 
> >  
> > 
> > My problem is that if GlusterFS isn’t good for starting with a small 
> > cluster (80TB) and growing over time to half a petabyte, what is the 
> > use case it is intended for?  Do you really have to start out with the 
> > amount of storage you think you’ll need in the long-run and just fill 
> > it up as you go?  That’s why I’m nervous about our understanding of 
> > the rebalance.  It’s hard to believe it works this way (at least from 
> > our perspective).
> > 
> >  
> > 
> > We have a lot of man hours into writing code and putting 
> > infrastructure in for GlusterFS.  We can likely reuse much of it for 
> > another system.  I would just like to know that we really do 
> > understand the rebalance and that it really works the way I described 
> > it before we start evaluating other object store solutions.
> > 
> >  
> > 
> > Comments?
> > 
> >  
> > 
> > Scott
> > 
> >  
> > 
> >  
> > 
> > 
> > _______________________________________________
> > Gluster-users mailing list
> > [email protected]
> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 


_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Is Gluster the wrong solution for us?

Reply via email to