[nlug] Re: DRBD write up.

Don Delp Tue, 14 Oct 2008 09:20:52 -0700

On Mon, Oct 13, 2008 at 3:13 PM, Mark J Bailey <[EMAIL PROTECTED]> wrote:
>
> Steven,
>
> A hearty thanks for this!  I had been starting to once again look into this
> also for possible use at a customer's site in Brentwood.  This is a welcome
> recap from another "known" (!) perspective!  :-)
>
> Thanks again!
>
> Mark
>
> --On Monday, October 13, 2008 2:18 PM -0500 "Steven S. Critchfield"
> <[EMAIL PROTECTED]> wrote:
>
>>
>> I was hoping to wait till I put this in production before writing it up,
>> but that is still a little while off and it seems some of this
>> information is needed for you all.
>>
>> DRBD is Distributed Redundant Block Device.
>>
>> Block devices are those things you tend to deal with in big chunks like
>> disk drives. Linux allows you to do some interesting things with block
>> devices. Specifically, there are some layers that do some type of work
>> and export a block looking device for you to use.
>>
>> Many of you have probably used the loopback device. This is a piece of
>> software that takes a file and makes it look like a block device you can
>> mount and even put files into.
>>
>> Some may even be aware of the MD software raid driver. This makes more
>> than one drive look like a single drive.
>>
>> Probably fewer but becoming a larger group know about LVM and the ability
>> to also make multiple drives look like a single block device.
>>
>> DRBD is similar in that it exports a block device for you to use. The
>> munging it does though is in making a copy of the underlying storage on a
>> different machine. DRBD is Raid mirroring by way of getting the data off
>> to another machine completely.
>>
>> DRBD on it's own is nice. You get your data backed up in realtime and you
>> don't run the risk of the PSU dieing in some spectacular way taking out
>> more than one drive of your raid array and destroying your working data
>> set.
>>
>> Where DRBD becomes a great thing to have is when you combine it with
>> Heartbeat. Heartbeat will essentially ping the remote machine to verify
>> it is up and behaving. If it is down, it will take possession of the IP
>> you are exporting services on, fire up a configured list of services, and
>> convert your local copy of the DRBD drive to primary and mount it up to
>> continue doing what ever the server is meant to be doing.
>>
>> In our company, we had the need to have a quicker than our restore time
>> from backups recovery of data. The machine that was of most importance
>> for this is basically just a big fileserver with NFS running on it. Our
>> push to find something for it came from a botched diagnosis of our
>> problems. But in this botched diagnosis, some of our attempted solutions
>> showed we hadn't thought our way through this kind of failure and
>> recovery. So in steps the fruits of our research, DRBD.
>>
>> So, our problem is we need fast recovery of a large NFS file share. For
>> this we ordered 2 supermicro servers. We outfitted them with 2 1tb drives
>> each in a local mirror. Yes I know this is not big by some people's
>> standards. Each machine has 2 gigs memory, and dual core cpus, and dual
>> nics. We hooked eth1 on each machine to the other. This gives us a nice
>> private gigabit ethernet link. This private link is what we configure
>> DRBD and Heartbeat to talk over.
>>
>> I partitioned each of the machines like this
>> sda1 4gb swap
>> sdb1 4gb swap
>> sda2 1gb /boot
>> sdb2 1gb /home
>> sda3 5gb /
>> sdb3 5gb /var
>>
>> The below two are joined as md0 as a raid1.
>> sda4 990gb
>> sdb4 990gb
>>
>> I then used DRBD to make md0 on each machine a raid 1 with the other
>> machine. And for a lack of a better place to mount drbd0, it is mounted
>> in /media/drbd.
>>
>> One of the tricks I have learned with Heartbeat is that there are many
>> services which have their configs in etc that you will want to be
>> available all the time. So I create an etc-shared and var-shared in
>> /media/drbd to store these configs and running information. I try my best
>> to recreate the directory structure there as well. Then I move the
>> configs from their normal location to this shared location and link the
>> old location there.
>>
>> For instance, the configuration for bacula(our backup solution), NFS
>> information, SSH config, and such all go to live on the DRBD drive. This
>> way if I make changes to whatever is the primary machine, the secondary
>> when promoted will use these same configs.
>>
>> NFS has /etc/exports that is of interest to us in our file server setup.
>> NFS also stores state information in /var/lib/nfs. When /var/lib/nfs is
>> accessible to the secondary machine as it is promoted, any running copies
>> or reads continue as if nothing happened. I have done the copy a ISO
>> image to the cluster and go pull the power cord, watch the failover, and
>> then the copy continues with a perfect copy according to md5sum being the
>> result.
>>
>> DRBD configuration itself is pretty simple for the normal case. I did
>> little modification of the Debian supplied default config.
>>
>> Heartbeat on the other hand took quite a bit of tweaking. Most of
>> Heartbeats config is specific to how you want it to talk to the other
>> machines and how fast you want it to determine the other side has gone
>> missing and not just is too busy to talk to you.
>>
>> The portion of Heartbeat that was of most importance to my project is the
>> haresources file. I'm still not fully up on the specs for it, but it
>> seems there are commands with colons in it that denote built in commands,
>> and the others are for init.d files to run. My config sets our local DRBD
>> drive as the primary in the cluster. This means we can now access it. It
>> mounts the filesystem. We then crank up nfs-common and nfs-kernel-server.
>> We also crank up bacula-fd so we can deal with backups. I also created a
>> new ssh init.d script such that it could fire up a cluster sshd with
>> configs located on the DRBD drive. We then finally fire up a aliased IP
>> address that is the one used for contacting the primary machine.
>>
>> I guess that points out as well that when you define a service as being
>> for the cluster, you need to either make a new init script that will
>> start the service on the cluster IP only, or you need to remove the rc
>> links to the init script. Removing the links keeps the service from
>> starting until it is able to use the cluster IP.
>>
>> While covered recently, the ssh problem I had was solved wonderfully by
>> having a sshd running tied to the static individual IPs of the machines
>> in the cluster while creating alternate configs and keys for the cluster
>> itself and starting the sshd with that config list when a machine became
>> the primary for the cluster.
>>
>> So now that I have given a quick primer on the tech and why I needed it,
>> I should give some more specs.
>>
>> Speed across the local drive even though it then has to copy the data to
>> the remote via GigE ethernet and write it there as well, is pretty
>> decent. I think we achieved near local drive performance, but that might
>> be just because the speed of writing to 2 local drives mirrored in
>> software is slowing it down about as much as copying to the remote
>> machine and having it write as well. Reads are done locally, so that
>> wasn't really affected. For us as a whole, the true speed limiter for the
>> results of the NFS mount is usually the internet connection anyways. So
>> the speeds are way faster than what our uplink speed is.
>>
>> Overall, Heartbeat was more of a pain than DRBD was to setup and tune
>> properly, but both are not all that hard to handle. Once you get into the
>> mindsets on how to use them, it becomes very easy to start expanding the
>> idea to get more things accomplished. And the fact that all of this was
>> accomplished with an off the shelf Debian install was loads of help.
>>
>> As for our exposure to failures;
>>
>> We started with a Dell 2450 with 2 P3 860mhz cpus, a gig of ram, and 4
>> 73gb SCSI drives in a raid 5 array resulting in about 200gb of drive
>> space and only 2 drive failures away from full lost data needing a
>> restore from our backup solution.
>>
>> We now are working with 2 Supermicros with 2 gigs ram each, 2 3ghz cores
>> each, and 2 1tb drives raided together each. Now we need to lose all 4
>> drives before we have to roll back to the backup solution.
>>
>> We had used 2u of space for the 2450, and we now use 2u of space for both
>> machines. Power usage is up a little due to the increased support
>> hardware for the drives, but our tolerance for failure has gone way up
>> before data loss.
>>
>> We may even look into doing some of these same things to other machines.
>> We have even thought about trying this out with Postgresql as a cheap
>> replication system. Essentially making sure that recovery not really any
>> different from power failure. In progress queries are lost, but a
>> reconnect and you are back running again.
>>
>> Oh, well. This was more of a ramble than a good write up. But maybe it
>> will give you some ideas as to what can be accomplished and make your own
>> installs more resilient to failure.
>>
>>
>> --
>> Steven Critchfield [EMAIL PROTECTED]
>>
>> >
>
>
>
> ________________________________________________________
> Mark J. Bailey        Jobsoft Design & Development, Inc.
> 104 Arlington Place, Suite 100        Franklin, TN 37064
> EMAIL: [EMAIL PROTECTED]      WEB: http://www.jobsoft.com/
> VOICE:(615)904-9559 FAX:(615)904-9576 CELL:(615)308-9099
>


I thought it was a pretty good introductory writeup.  It doesn't
directly influence anything that I'm working on right now, but it was
nice to see a general description and application of some of the
programs you're using.

$Thanks++

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"NLUG" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/nlug-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

[nlug] Re: DRBD write up.

Reply via email to