= 0.5PB?

BGM Thu, 02 Jan 2014 11:19:51 -0800


Sent from my iPad


On 02.01.2014, at 18:06, Justin Dossey <[email protected]> wrote:

> 1) It depends on the number of drives per chassis, your tolerance for risk, 
> and the speed of rebuilds.  I'd recommend doing a couple of test rebuilds 
> with different array sizes to see how fast your controller and drives can 
> complete them, and then comparing the rebuild completion times to your SLA-- 
> if a rebuild takes two days to complete, is that good enough for you 
> (especially given the chances of another failure occuring during the 
> rebuild)?  All other things being equal, the smaller the array, the faster 
> the rebuild, but the more "wasted" space in the array.  Also note that many 
> controllers have tunable rebuild algorithms, so you can divert more resources 
> to completing rebuilds faster at the cost of performance.  One data point 
> from me: my last 16-2T-SATA RAID-6 rebuild took about 58 hours to complete.
> 
> 2) My understanding is that the way file reads work on GlusterFS, read 
> requests are sent to all nodes and the data is used from the first node to 
> respond to the request.  So if one node is busier than others, it is likely 
> to respond more slowly and thus receive a lower portion of the read activity, 
> as long as the files being read are larger than a single response.    
> 
> 
> On Wed, Jan 1, 2014 at 12:21 PM, Fredrik Häll <[email protected]> wrote:
>> Thanks for all the input!
>> 
>> It sure sounds like RAID-6 for disk failures and Gluster for the spanning 
>> and high level redundancy parts is a good candidate. 
>> 
>> Some final questions: 
>> 
>> 1) How big can one comfortably go in terms of RAID-6 array size? Given 4TB 
>> SATA/SAS drives. On the one hand much points to keeping as few RAIDs as 
>> possible, and disk usage is of course maximized. But there are complications 
>> in terms of rebuild times and risk of losing the 2 drives. Hot spares may 
>> also be an option. Your reflections?
>> 
>> 2) Is there any intelligence or automation in Gluster that makes smart use 
>> of dual (or multiple) replicas? Say that I have 2 replicas, and one of them 
>> is spending some effort on a RAID rebuild, is there functionality for 
>> manually or automatically preferring the other (healhy) replica?
>> 
>> Best regards, 
>> 
>> Fredrik
>> 
>> 
>> On Tue, Dec 31, 2013 at 10:27 PM, Justin Dossey <[email protected]> wrote:
>>> Yes, RAID-6 is better than RAID-5 in most cases.  I agonized over the 
>>> decision to deploy 5 for my Gluster cluster, and the reason I went with 5 
>>> is that the number of drives in the brick was (IMO) acceptably low.  I use 
>>> 6 for my 16-drive arrays, which means I have to lose 3 disks out of the 16 
>>> to lose my data.  With 2x8-drive arrays in 5, I also have to lose 3 disks 
>>> to lose data, but if I do lose data, I only lose 50% of the data on the 
>>> server, and all these bricks are distribute-replicate anyway, so I wouldn't 
>>> actually lose any data at all.  That consideration, paired with the fact 
>>> that I keep spares on hand and replace failed drives within a day or two, 
>>> means that I'm okay with running 2x RAID-5 instead of 1x RAID-6.  (2x 
>>> RAID-6 would put me below my storage target, forcing additional hardware 
>>> purchases.)
>>> 
>>> I suppose the short answer is "evaluate your storage needs carefully."
>>> 
>>> 
>>> On Tue, Dec 31, 2013 at 11:19 AM, James <[email protected]> wrote:
>>>> On Tue, Dec 31, 2013 at 11:33 AM, Justin Dossey <[email protected]> wrote:
>>>> >
>>>> > Yes, I'd recommend sticking with RAID in addition to GlusterFS.  The 
>>>> > cluster I'm mid-build on (it's a live migration) is 18x RAID-5 bricks on 
>>>> > 9 servers.  Each RAID-5 brick is 8 2T drives, so about 13T usable.  It's 
>>>> > better to deal with a RAID when a disk fails than to have to pull and 
>>>> > replace the brick, and I believe Red Hat's official recommendation is 
>>>> > still to minimize the number of bricks per server (which makes me a 
>>>> > rebel for having two, I suppose).  9 (slow-ish, SATA RAID) servers 
>>>> > easily saturate 1Gbit on a busy day.
>>>> 
>>>> 
>>>> I think RedHat also recommends RAID6 instead of RAID5. In any case, I
>>>> sure do, at least.
>>>> 
>>>> James
>>>> 
>>>> 
>>>> 
>>>> On Mon, Dec 30, 2013 at 5:54 AM, bernhard glomm
>>>> <[email protected]> wrote:
>>>> >
>>>> > some years ago I had a similar tasks.
>>>> > I did:
>>>> > - We had disk arrays with 24 slots, with optional 4 JBODS (each 24 
>>>> > slots) stacked on top, dual LWL controller 4GB (costs ;-)
>>>> > - creating raids (6) with not more than 7 disks each
>>>> > - as far as I remember I had one hot spare per each 4 raids
>>>> > - connecting as many of this raid bricks together with striped glusterfs 
>>>> > as needed
>>>> > - as for replication, I was planing for an offside duplicate of this 
>>>> > architecture and
>>>> > because losing data was REALLY not an option, writing it all off at a 
>>>> > second offside location onto LTFS tapes.
>>>> > As the original version for the LTFS library edition was far to 
>>>> > expensive for us
>>>> > I found an alternative solution that does the same thing
>>>> > but fort a much reasonable prize. LTFS is still a big thing in digital 
>>>> > Archiving.
>>>> > Give me a note if you like more details on that.
>>>> >
>>>> > - This way I could fsck all (not to big) raids in parallel (sped things 
>>>> > up)
>>>> > - proper robustness against disk failure
>>>> > - space that could grow infinite in size (add more and bigger disks) and 
>>>> > keep up with access speed (ad more server) at a pretty foreseeable prize
>>>> > - LTFS in the vault provided just the finishing having data accessible 
>>>> > even if two out three sides are down,
>>>> > reasonable prize, (for instance no heat problem at the tape location)
>>>> > Nowadays I would go for the same approach except zfs raidz3 bricks (at 
>>>> > least do a thorough test on it)
>>>> > instead of (small) hardware raid bricks.
>>>> > As for simplicity and robustness I wouldn't like to end up with several 
>>>> > hundred glusterfs bricks, each on one individual disk,
>>>> > but rather leaving disk failure prevention either to hardware raid or 
>>>> > zfs and using gluster to connect this bricks into the
>>>> > fs size I need(  - and for mirroring the whole thing to a second side if 
>>>> > needed)
>>>> > hth
>>>> > Bernhard
>>>> >
>>>> >
>>>> >
>>>> > Bernhard Glomm
>>>> > IT Administration
>>>> >
>>>> > Phone: +49 (30) 86880 134
>>>> > Fax: +49 (30) 86880 100
>>>> > Skype: bernhard.glomm.ecologic
>>>> > Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 
>>>> > Berlin | Germany
>>>> > GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: 
>>>> > DE811963464
>>>> > Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH
>>>> > ________________________________
>>>> >
>>>> > On Dec 25, 2013, at 8:47 PM, Fredrik Häll <[email protected]> wrote:
>>>> >
>>>> > I am new to Gluster, but so far it seems very attractive for my needs. I 
>>>> > am trying to assess its suitability for a cost-efficient storage problem 
>>>> > I am tackling. Hopefully someone can help me find how to best solve my 
>>>> > problem.
>>>> >
>>>> > Capacity:
>>>> > Start with around 0.5PB usable
>>>> >
>>>> > Redundancy:
>>>> > 2 replicas with non-RAID is not sufficient. Either 3 replicas with 
>>>> > non-raid or some combination of 2 replicas and RAID?
>>>> >
>>>> > File types:
>>>> > Large files, around 400-1500MB each.
>>>> >
>>>> > Usage pattern:
>>>> > Archive (not sure if this matches nearline or not..) with files being 
>>>> > added at around 200-300GB/day (3-400 files/day). Very few reads, order 
>>>> > of 10 file accesses per day. Concurrent reads highly unlikely.
>>>> >
>>>> > The main two factors for me are cost and redundancy. Losing data is not 
>>>> > an option, being an archive solution. Cost/usable TB is the other key 
>>>> > factor, as we see growth estimates of 100-500TB/year.
>>>> >
>>>> > Looking just at $/TB, a RAID-based approach to me sounds more efficient. 
>>>> > But RAID rebuild times with large arrays of large capacity drives sound 
>>>> > really scary. Not sure if something smart can be done since we will 
>>>> > still have a replica left during the rebuild?
>>>> >
>>>> > So, any suggestions on what would be possible and cost-efficient 
>>>> > solutions?
>>>> >
>>>> > - Any experience on dense servers, what is advisable? 24/36/50/60 slots?
>>>> > - SAS expanders/storage pods?
>>>> > - RAID vs non-RAID?
>>>> > - Number of replicas etc?
>>>> >
>>>> > Best,
>>>> >
>>>> > Fredrik
>>>> > _______________________________________________
>>>> > Gluster-users mailing list
>>>> > [email protected]
>>>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Gluster-users mailing list
>>>> > [email protected]
>>>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>> 
>>>> 
>>>> 
>>>> 
>>>> > --
>>>> > Justin Dossey
>>>> > CTO, PodOmatic
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Gluster-users mailing list
>>>> > [email protected]
>>>> > http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>> 
>>> 
>>> 
>>> -- 
>>> Justin Dossey
>>> CTO, PodOmatic
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> -- 
> Justin Dossey
> CTO, PodOmatic
> 
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Design/HW for cost-efficient NL archive >= 0.5PB?

Reply via email to