Thanks for the write up. This is a good introduction to the subject and made me think about some things I hadn't considered before.
Chris Steven S. Critchfield wrote: > I was hoping to wait till I put this in production before writing it up, but > that is still a little while off and it seems some of this information is > needed for you all. > > DRBD is Distributed Redundant Block Device. > > Block devices are those things you tend to deal with in big chunks like disk > drives. Linux allows you to do some interesting things with block devices. > Specifically, there are some layers that do some type of work and export a > block looking device for you to use. > > Many of you have probably used the loopback device. This is a piece of > software that takes a file and makes it look like a block device you can > mount and even put files into. > > Some may even be aware of the MD software raid driver. This makes more than > one drive look like a single drive. > > Probably fewer but becoming a larger group know about LVM and the ability to > also make multiple drives look like a single block device. > > DRBD is similar in that it exports a block device for you to use. The munging > it does though is in making a copy of the underlying storage on a different > machine. DRBD is Raid mirroring by way of getting the data off to another > machine completely. > > DRBD on it's own is nice. You get your data backed up in realtime and you > don't run the risk of the PSU dieing in some spectacular way taking out more > than one drive of your raid array and destroying your working data set. > > Where DRBD becomes a great thing to have is when you combine it with > Heartbeat. Heartbeat will essentially ping the remote machine to verify it is > up and behaving. If it is down, it will take possession of the IP you are > exporting services on, fire up a configured list of services, and convert > your local copy of the DRBD drive to primary and mount it up to continue > doing what ever the server is meant to be doing. > > In our company, we had the need to have a quicker than our restore time from > backups recovery of data. The machine that was of most importance for this is > basically just a big fileserver with NFS running on it. Our push to find > something for it came from a botched diagnosis of our problems. But in this > botched diagnosis, some of our attempted solutions showed we hadn't thought > our way through this kind of failure and recovery. So in steps the fruits of > our research, DRBD. > > So, our problem is we need fast recovery of a large NFS file share. For this > we ordered 2 supermicro servers. We outfitted them with 2 1tb drives each in > a local mirror. Yes I know this is not big by some people's standards. Each > machine has 2 gigs memory, and dual core cpus, and dual nics. We hooked eth1 > on each machine to the other. This gives us a nice private gigabit ethernet > link. This private link is what we configure DRBD and Heartbeat to talk over. > > I partitioned each of the machines like this > sda1 4gb swap > sdb1 4gb swap > sda2 1gb /boot > sdb2 1gb /home > sda3 5gb / > sdb3 5gb /var > > The below two are joined as md0 as a raid1. > sda4 990gb > sdb4 990gb > > I then used DRBD to make md0 on each machine a raid 1 with the other machine. > And for a lack of a better place to mount drbd0, it is mounted in /media/drbd. > > One of the tricks I have learned with Heartbeat is that there are many > services which have their configs in etc that you will want to be available > all the time. So I create an etc-shared and var-shared in /media/drbd to > store these configs and running information. I try my best to recreate the > directory structure there as well. Then I move the configs from their normal > location to this shared location and link the old location there. > > For instance, the configuration for bacula(our backup solution), NFS > information, SSH config, and such all go to live on the DRBD drive. This way > if I make changes to whatever is the primary machine, the secondary when > promoted will use these same configs. > > NFS has /etc/exports that is of interest to us in our file server setup. NFS > also stores state information in /var/lib/nfs. When /var/lib/nfs is > accessible to the secondary machine as it is promoted, any running copies or > reads continue as if nothing happened. I have done the copy a ISO image to > the cluster and go pull the power cord, watch the failover, and then the copy > continues with a perfect copy according to md5sum being the result. > > DRBD configuration itself is pretty simple for the normal case. I did little > modification of the Debian supplied default config. > > Heartbeat on the other hand took quite a bit of tweaking. Most of Heartbeats > config is specific to how you want it to talk to the other machines and how > fast you want it to determine the other side has gone missing and not just is > too busy to talk to you. > > The portion of Heartbeat that was of most importance to my project is the > haresources file. I'm still not fully up on the specs for it, but it seems > there are commands with colons in it that denote built in commands, and the > others are for init.d files to run. My config sets our local DRBD drive as > the primary in the cluster. This means we can now access it. It mounts the > filesystem. We then crank up nfs-common and nfs-kernel-server. We also crank > up bacula-fd so we can deal with backups. I also created a new ssh init.d > script such that it could fire up a cluster sshd with configs located on the > DRBD drive. We then finally fire up a aliased IP address that is the one used > for contacting the primary machine. > > I guess that points out as well that when you define a service as being for > the cluster, you need to either make a new init script that will start the > service on the cluster IP only, or you need to remove the rc links to the > init script. Removing the links keeps the service from starting until it is > able to use the cluster IP. > > While covered recently, the ssh problem I had was solved wonderfully by > having a sshd running tied to the static individual IPs of the machines in > the cluster while creating alternate configs and keys for the cluster itself > and starting the sshd with that config list when a machine became the primary > for the cluster. > > So now that I have given a quick primer on the tech and why I needed it, I > should give some more specs. > > Speed across the local drive even though it then has to copy the data to the > remote via GigE ethernet and write it there as well, is pretty decent. I > think we achieved near local drive performance, but that might be just > because the speed of writing to 2 local drives mirrored in software is > slowing it down about as much as copying to the remote machine and having it > write as well. Reads are done locally, so that wasn't really affected. For us > as a whole, the true speed limiter for the results of the NFS mount is > usually the internet connection anyways. So the speeds are way faster than > what our uplink speed is. > > Overall, Heartbeat was more of a pain than DRBD was to setup and tune > properly, but both are not all that hard to handle. Once you get into the > mindsets on how to use them, it becomes very easy to start expanding the idea > to get more things accomplished. And the fact that all of this was > accomplished with an off the shelf Debian install was loads of help. > > As for our exposure to failures; > > We started with a Dell 2450 with 2 P3 860mhz cpus, a gig of ram, and 4 73gb > SCSI drives in a raid 5 array resulting in about 200gb of drive space and > only 2 drive failures away from full lost data needing a restore from our > backup solution. > > We now are working with 2 Supermicros with 2 gigs ram each, 2 3ghz cores > each, and 2 1tb drives raided together each. Now we need to lose all 4 drives > before we have to roll back to the backup solution. > > We had used 2u of space for the 2450, and we now use 2u of space for both > machines. Power usage is up a little due to the increased support hardware > for the drives, but our tolerance for failure has gone way up before data > loss. > > We may even look into doing some of these same things to other machines. We > have even thought about trying this out with Postgresql as a cheap > replication system. Essentially making sure that recovery not really any > different from power failure. In progress queries are lost, but a reconnect > and you are back running again. > > Oh, well. This was more of a ramble than a good write up. But maybe it will > give you some ideas as to what can be accomplished and make your own installs > more resilient to failure. > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "NLUG" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/nlug-talk?hl=en -~----------~----~----~----~------~----~------~--~---
