I was hoping to wait till I put this in production before writing it up, but that is still a little while off and it seems some of this information is needed for you all.
DRBD is Distributed Redundant Block Device. Block devices are those things you tend to deal with in big chunks like disk drives. Linux allows you to do some interesting things with block devices. Specifically, there are some layers that do some type of work and export a block looking device for you to use. Many of you have probably used the loopback device. This is a piece of software that takes a file and makes it look like a block device you can mount and even put files into. Some may even be aware of the MD software raid driver. This makes more than one drive look like a single drive. Probably fewer but becoming a larger group know about LVM and the ability to also make multiple drives look like a single block device. DRBD is similar in that it exports a block device for you to use. The munging it does though is in making a copy of the underlying storage on a different machine. DRBD is Raid mirroring by way of getting the data off to another machine completely. DRBD on it's own is nice. You get your data backed up in realtime and you don't run the risk of the PSU dieing in some spectacular way taking out more than one drive of your raid array and destroying your working data set. Where DRBD becomes a great thing to have is when you combine it with Heartbeat. Heartbeat will essentially ping the remote machine to verify it is up and behaving. If it is down, it will take possession of the IP you are exporting services on, fire up a configured list of services, and convert your local copy of the DRBD drive to primary and mount it up to continue doing what ever the server is meant to be doing. In our company, we had the need to have a quicker than our restore time from backups recovery of data. The machine that was of most importance for this is basically just a big fileserver with NFS running on it. Our push to find something for it came from a botched diagnosis of our problems. But in this botched diagnosis, some of our attempted solutions showed we hadn't thought our way through this kind of failure and recovery. So in steps the fruits of our research, DRBD. So, our problem is we need fast recovery of a large NFS file share. For this we ordered 2 supermicro servers. We outfitted them with 2 1tb drives each in a local mirror. Yes I know this is not big by some people's standards. Each machine has 2 gigs memory, and dual core cpus, and dual nics. We hooked eth1 on each machine to the other. This gives us a nice private gigabit ethernet link. This private link is what we configure DRBD and Heartbeat to talk over. I partitioned each of the machines like this sda1 4gb swap sdb1 4gb swap sda2 1gb /boot sdb2 1gb /home sda3 5gb / sdb3 5gb /var The below two are joined as md0 as a raid1. sda4 990gb sdb4 990gb I then used DRBD to make md0 on each machine a raid 1 with the other machine. And for a lack of a better place to mount drbd0, it is mounted in /media/drbd. One of the tricks I have learned with Heartbeat is that there are many services which have their configs in etc that you will want to be available all the time. So I create an etc-shared and var-shared in /media/drbd to store these configs and running information. I try my best to recreate the directory structure there as well. Then I move the configs from their normal location to this shared location and link the old location there. For instance, the configuration for bacula(our backup solution), NFS information, SSH config, and such all go to live on the DRBD drive. This way if I make changes to whatever is the primary machine, the secondary when promoted will use these same configs. NFS has /etc/exports that is of interest to us in our file server setup. NFS also stores state information in /var/lib/nfs. When /var/lib/nfs is accessible to the secondary machine as it is promoted, any running copies or reads continue as if nothing happened. I have done the copy a ISO image to the cluster and go pull the power cord, watch the failover, and then the copy continues with a perfect copy according to md5sum being the result. DRBD configuration itself is pretty simple for the normal case. I did little modification of the Debian supplied default config. Heartbeat on the other hand took quite a bit of tweaking. Most of Heartbeats config is specific to how you want it to talk to the other machines and how fast you want it to determine the other side has gone missing and not just is too busy to talk to you. The portion of Heartbeat that was of most importance to my project is the haresources file. I'm still not fully up on the specs for it, but it seems there are commands with colons in it that denote built in commands, and the others are for init.d files to run. My config sets our local DRBD drive as the primary in the cluster. This means we can now access it. It mounts the filesystem. We then crank up nfs-common and nfs-kernel-server. We also crank up bacula-fd so we can deal with backups. I also created a new ssh init.d script such that it could fire up a cluster sshd with configs located on the DRBD drive. We then finally fire up a aliased IP address that is the one used for contacting the primary machine. I guess that points out as well that when you define a service as being for the cluster, you need to either make a new init script that will start the service on the cluster IP only, or you need to remove the rc links to the init script. Removing the links keeps the service from starting until it is able to use the cluster IP. While covered recently, the ssh problem I had was solved wonderfully by having a sshd running tied to the static individual IPs of the machines in the cluster while creating alternate configs and keys for the cluster itself and starting the sshd with that config list when a machine became the primary for the cluster. So now that I have given a quick primer on the tech and why I needed it, I should give some more specs. Speed across the local drive even though it then has to copy the data to the remote via GigE ethernet and write it there as well, is pretty decent. I think we achieved near local drive performance, but that might be just because the speed of writing to 2 local drives mirrored in software is slowing it down about as much as copying to the remote machine and having it write as well. Reads are done locally, so that wasn't really affected. For us as a whole, the true speed limiter for the results of the NFS mount is usually the internet connection anyways. So the speeds are way faster than what our uplink speed is. Overall, Heartbeat was more of a pain than DRBD was to setup and tune properly, but both are not all that hard to handle. Once you get into the mindsets on how to use them, it becomes very easy to start expanding the idea to get more things accomplished. And the fact that all of this was accomplished with an off the shelf Debian install was loads of help. As for our exposure to failures; We started with a Dell 2450 with 2 P3 860mhz cpus, a gig of ram, and 4 73gb SCSI drives in a raid 5 array resulting in about 200gb of drive space and only 2 drive failures away from full lost data needing a restore from our backup solution. We now are working with 2 Supermicros with 2 gigs ram each, 2 3ghz cores each, and 2 1tb drives raided together each. Now we need to lose all 4 drives before we have to roll back to the backup solution. We had used 2u of space for the 2450, and we now use 2u of space for both machines. Power usage is up a little due to the increased support hardware for the drives, but our tolerance for failure has gone way up before data loss. We may even look into doing some of these same things to other machines. We have even thought about trying this out with Postgresql as a cheap replication system. Essentially making sure that recovery not really any different from power failure. In progress queries are lost, but a reconnect and you are back running again. Oh, well. This was more of a ramble than a good write up. But maybe it will give you some ideas as to what can be accomplished and make your own installs more resilient to failure. -- Steven Critchfield [EMAIL PROTECTED] --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "NLUG" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/nlug-talk?hl=en -~----------~----~----~----~------~----~------~--~---
