Re: [Gluster-users] Inviting comments on my plans

Fernando Frediani (Qube) Mon, 19 Nov 2012 02:19:06 -0800

Hi,

I agree with the comment about Fedora and wouldn't choose it a distribution, 
but if you are comfortable with it go ahead as I don't think this will be the 
major pain.

RAID: I see where you are coming from to choose not have any RAID and I have 
thought myself before to do the same, mainly for performance reasons, but as 
mentioned how are you going to handle the drive swap ? If you think you can 
somehow automate it please share with us as I believe it is a major performance 
gain running the disks independently .

What you are willing to do with XFS+BRTFS I am not quiet sure it will work as 
you expect. Ideally you need to use snapshots from the Distributed Filesystem 
otherwise you might think you are getting a consistent copy of the data and you 
might not as you are not supposed to be reading/writing other than on the 
Gluster mount.

Performance: Simple and short - If you can compromise one disk per host AND 
choose to not go with independent disks(no RAID) go with RAID 5.
As your system grows the reads and write should (in theory) be distributed 
across all bricks. If you have a disk failed you can easily replace it and even 
in a unlikely event that you lose two disks in a server and loose its data 
entirely you still have a copy of it in another place and can rebuild it with a 
bit of patience , so no data loss.
Also we have had more than enough reports of bad performance in Gluster for all 
kinds of configurations (including RAID 10) so I don't think anyone should 
expect Gluster to perform that well, so using RAID 5, 6 or 10 underneath 
shouldn't make much difference and RAID 10 only would waste space. If you are 
storing bulk data (multimedia, images, big files) great, it will be streamed 
and sequential data and it should be ok and acceptable, but if you are storing 
things that do a lot of small IO or Virtual machines I'm not sure if Gluster is 
the best choice for you and you should think carefully about it.

Fernando

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Brian Candler
Sent: 18 November 2012 12:19
To: Shawn Heisey
Cc: [email protected]
Subject: Re: [Gluster-users] Inviting comments on my plans

On Sat, Nov 17, 2012 at 11:04:33AM -0700, Shawn Heisey wrote:
> Dell R720xd servers with two internal OS drives and 12 hot-swap 
> external 3.5 inch bays.  Fedora 18 alpha, to be upgraded to Fedora
> 18 when it is released.

I would strongly recommend *against* Fedora in any production environment, 
simply because there are new releases every 6 months, and releases are only 
supported for 18 months from release.  You are therefore locked into a complete 
OS reinstall every 6 months (or at best, three upgrades every 18 months).

If you want something that's free and RPM-based for production, I suggest you 
use CentOS or Scientific Linux.

> 2TB simple LVM volumes for bricks.
> A combination of 4TB disks (two bricks per drive) and 2TB disks.

With no RAID, 100% reliant on gluster replication? You discussed this later but 
I would still advise against this.  If you go this route, you will need to be 
very sure about your procedures for (a) detecting failed drives, and
(b) replacing failed drives.  It's certainly not a simple pull-out/push-in (or 
rebuild-on-hot-spare) as it would be with RAID.  You'll have to introduce a new 
drive, create the filesystem (or two filesystems on a 4TB drive), and 
reintroduce those filesystems as bricks into gluster: but not using 
replace-brick because the failed brick will have gone.  So you need to be 
confident in the abilities of your operational staff to do this.

If you do it this way, please test and document it for the rest of us.

> Now for the really controversial part of my plans: Left-hand brick 
> filesystems (listed first in each replica set) will be XFS, right-hand 
> bricks will be BTRFS.  The idea here is that we will have one copy of 
> the volume on a fully battle-tested and reliable filesystem, and 
> another copy of the filesystem stored in a way that we can create 
> periodic snapshots for last-ditch "oops" recovery.
> Because of the distributed nature of the filesystem, using those 
> snapshots will not be straightforward, but it will be POSSIBLE.

Of course it depends on your HA requirements, but another approach would be to 
have non-replicated volume (XFS) and then geo-replicate to another server with 
BTRFS, and do your snapshotting there. Then your "live" data is not dependent 
on BTRFS issues.

This also has the bonus that your BTRFS server could be network-remote.

> * Performance.
> RAID 5/6 comes with a severe penalty on performance during sustained 
> writes -- writing more data than will fit in your RAID controller's 
> cache memory.  Also, if you have a failed disk, all performance is 
> greatly impacted during the entire rebuild process, which for a 4TB 
> disk is likely to take a few days.

Actually, sustained sequential writes are the best case for RAID5/6. It's 
random writes which will kill you.

If random write performance is important I'd use RAID10 - which means for a 
fully populated server you'll get 24TB instead of 48TB.  Linux mdraid "far 2" 
layout will give you the same read performance as RAID0, indeed somewhat faster 
because all the seeks are within the first half of the drive, but with data 
replication.

With georeplication, your BTRFS backup server could be RAID5 or RAID6 though.

So it's down to the relative importance of various things:
- sufficient capacity
- sufficient performance
- acceptable cost
- ease of management (when a drive fails)
- data availability (if an entire server fails)

For me, "ease of management (when a drive fails)" comes very high on the list, 
because drive failures *will* happen, and you need to deal with them as a 
matter-of-course. You might not feel the same way.

I wrote "sufficient capacity/performance" rather than "maximum 
capacity/performance" because it depends what your business requirements are.  
I mean, having no RAID might give you maximum performance on those 4TB drives, 
but is even that good enough for your needs?  If not, you might want to revisit 
and go with SSDs.  On the other hand, RAID6 might not be the
*best* write performance, but it might actually be good enough depending on 
what you're doing.

Regards,

Brian.
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Inviting comments on my plans

Reply via email to