[nlug] DRBD write up.

Steven S. Critchfield Mon, 13 Oct 2008 12:18:20 -0700

I was hoping to wait till I put this in production before writing it up, but 
that is still a little while off and it seems some of this information is 
needed for you all.


DRBD is Distributed Redundant Block Device.

Block devices are those things you tend to deal with in big chunks like disk 
drives. Linux allows you to do some interesting things with block devices. 
Specifically, there are some layers that do some type of work and export a 
block looking device for you to use.

Many of you have probably used the loopback device. This is a piece of software 
that takes a file and makes it look like a block device you can mount and even 
put files into. 

Some may even be aware of the MD software raid driver. This makes more than one 
drive look like a single drive.

Probably fewer but becoming a larger group know about LVM and the ability to 
also make multiple drives look like a single block device.

DRBD is similar in that it exports a block device for you to use. The munging 
it does though is in making a copy of the underlying storage on a different 
machine. DRBD is Raid mirroring by way of getting the data off to another 
machine completely.

DRBD on it's own is nice. You get your data backed up in realtime and you don't 
run the risk of the PSU dieing in some spectacular way taking out more than one 
drive of your raid array and destroying your working data set.

Where DRBD becomes a great thing to have is when you combine it with Heartbeat. 
Heartbeat will essentially ping the remote machine to verify it is up and 
behaving. If it is down, it will take possession of the IP you are exporting 
services on, fire up a configured list of services, and convert your local copy 
of the DRBD drive to primary and mount it up to continue doing what ever the 
server is meant to be doing.

In our company, we had the need to have a quicker than our restore time from 
backups recovery of data. The machine that was of most importance for this is 
basically just a big fileserver with NFS running on it. Our push to find 
something for it came from a botched diagnosis of our problems. But in this 
botched diagnosis, some of our attempted solutions showed we hadn't thought our 
way through this kind of failure and recovery. So in steps the fruits of our 
research, DRBD.

So, our problem is we need fast recovery of a large NFS file share. For this we 
ordered 2 supermicro servers. We outfitted them with 2 1tb drives each in a 
local mirror. Yes I know this is not big by some people's standards. Each 
machine has 2 gigs memory, and dual core cpus, and dual nics. We hooked eth1 on 
each machine to the other. This gives us a nice private gigabit ethernet link. 
This private link is what we configure DRBD and Heartbeat to talk over.

I partitioned each of the machines like this
sda1 4gb swap
sdb1 4gb swap
sda2 1gb /boot
sdb2 1gb /home
sda3 5gb /
sdb3 5gb /var

The below two are joined as md0 as a raid1.
sda4 990gb 
sdb4 990gb

I then used DRBD to make md0 on each machine a raid 1 with the other machine. 
And for a lack of a better place to mount drbd0, it is mounted in /media/drbd.

One of the tricks I have learned with Heartbeat is that there are many services 
which have their configs in etc that you will want to be available all the 
time. So I create an etc-shared and var-shared in /media/drbd to store these 
configs and running information. I try my best to recreate the directory 
structure there as well. Then I move the configs from their normal location to 
this shared location and link the old location there. 

For instance, the configuration for bacula(our backup solution), NFS 
information, SSH config, and such all go to live on the DRBD drive. This way if 
I make changes to whatever is the primary machine, the secondary when promoted 
will use these same configs. 

NFS has /etc/exports that is of interest to us in our file server setup. NFS 
also stores state information in /var/lib/nfs. When /var/lib/nfs is accessible 
to the secondary machine as it is promoted, any running copies or reads 
continue as if nothing happened. I have done the copy a ISO image to the 
cluster and go pull the power cord, watch the failover, and then the copy 
continues with a perfect copy according to md5sum being the result.

DRBD configuration itself is pretty simple for the normal case. I did little 
modification of the Debian supplied default config.

Heartbeat on the other hand took quite a bit of tweaking. Most of Heartbeats 
config is specific to how you want it to talk to the other machines and how 
fast you want it to determine the other side has gone missing and not just is 
too busy to talk to you. 

The portion of Heartbeat that was of most importance to my project is the 
haresources file. I'm still not fully up on the specs for it, but it seems 
there are commands with colons in it that denote built in commands, and the 
others are for init.d files to run. My config sets our local DRBD drive as the 
primary in the cluster. This means we can now access it. It mounts the 
filesystem. We then crank up nfs-common and nfs-kernel-server. We also crank up 
bacula-fd so we can deal with backups. I also created a new ssh init.d script 
such that it could fire up a cluster sshd with configs located on the DRBD 
drive. We then finally fire up a aliased IP address that is the one used for 
contacting the primary machine.

I guess that points out as well that when you define a service as being for the 
cluster, you need to either make a new init script that will start the service 
on the cluster IP only, or you need to remove the rc links to the init script. 
Removing the links keeps the service from starting until it is able to use the 
cluster IP. 

While covered recently, the ssh problem I had was solved wonderfully by having 
a sshd running tied to the static individual IPs of the machines in the cluster 
while creating alternate configs and keys for the cluster itself and starting 
the sshd with that config list when a machine became the primary for the 
cluster.

So now that I have given a quick primer on the tech and why I needed it, I 
should give some more specs.

Speed across the local drive even though it then has to copy the data to the 
remote via GigE ethernet and write it there as well, is pretty decent. I think 
we achieved near local drive performance, but that might be just because the 
speed of writing to 2 local drives mirrored in software is slowing it down 
about as much as copying to the remote machine and having it write as well. 
Reads are done locally, so that wasn't really affected. For us as a whole, the 
true speed limiter for the results of the NFS mount is usually the internet 
connection anyways. So the speeds are way faster than what our uplink speed is.

Overall, Heartbeat was more of a pain than DRBD was to setup and tune properly, 
but both are not all that hard to handle. Once you get into the mindsets on how 
to use them, it becomes very easy to start expanding the idea to get more 
things accomplished. And the fact that all of this was accomplished with an off 
the shelf Debian install was loads of help.

As for our exposure to failures;

We started with a Dell 2450 with 2 P3 860mhz cpus, a gig of ram, and 4 73gb 
SCSI drives in a raid 5 array resulting in about 200gb of drive space and only 
2 drive failures away from full lost data needing a restore from our backup 
solution.

We now are working with 2 Supermicros with 2 gigs ram each, 2 3ghz cores each, 
and 2 1tb drives raided together each. Now we need to lose all 4 drives before 
we have to roll back to the backup solution. 

We had used 2u of space for the 2450, and we now use 2u of space for both 
machines. Power usage is up a little due to the increased support hardware for 
the drives, but our tolerance for failure has gone way up before data loss.

We may even look into doing some of these same things to other machines. We 
have even thought about trying this out with Postgresql as a cheap replication 
system. Essentially making sure that recovery not really any different from 
power failure. In progress queries are lost, but a reconnect and you are back 
running again.

Oh, well. This was more of a ramble than a good write up. But maybe it will 
give you some ideas as to what can be accomplished and make your own installs 
more resilient to failure.

  
-- 
Steven Critchfield [EMAIL PROTECTED]

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"NLUG" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/nlug-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

[nlug] DRBD write up.

Reply via email to