Steven, A hearty thanks for this! I had been starting to once again look into this also for possible use at a customer's site in Brentwood. This is a welcome recap from another "known" (!) perspective! :-)
Thanks again! Mark --On Monday, October 13, 2008 2:18 PM -0500 "Steven S. Critchfield" <[EMAIL PROTECTED]> wrote: > > I was hoping to wait till I put this in production before writing it up, > but that is still a little while off and it seems some of this > information is needed for you all. > > DRBD is Distributed Redundant Block Device. > > Block devices are those things you tend to deal with in big chunks like > disk drives. Linux allows you to do some interesting things with block > devices. Specifically, there are some layers that do some type of work > and export a block looking device for you to use. > > Many of you have probably used the loopback device. This is a piece of > software that takes a file and makes it look like a block device you can > mount and even put files into. > > Some may even be aware of the MD software raid driver. This makes more > than one drive look like a single drive. > > Probably fewer but becoming a larger group know about LVM and the ability > to also make multiple drives look like a single block device. > > DRBD is similar in that it exports a block device for you to use. The > munging it does though is in making a copy of the underlying storage on a > different machine. DRBD is Raid mirroring by way of getting the data off > to another machine completely. > > DRBD on it's own is nice. You get your data backed up in realtime and you > don't run the risk of the PSU dieing in some spectacular way taking out > more than one drive of your raid array and destroying your working data > set. > > Where DRBD becomes a great thing to have is when you combine it with > Heartbeat. Heartbeat will essentially ping the remote machine to verify > it is up and behaving. If it is down, it will take possession of the IP > you are exporting services on, fire up a configured list of services, and > convert your local copy of the DRBD drive to primary and mount it up to > continue doing what ever the server is meant to be doing. > > In our company, we had the need to have a quicker than our restore time > from backups recovery of data. The machine that was of most importance > for this is basically just a big fileserver with NFS running on it. Our > push to find something for it came from a botched diagnosis of our > problems. But in this botched diagnosis, some of our attempted solutions > showed we hadn't thought our way through this kind of failure and > recovery. So in steps the fruits of our research, DRBD. > > So, our problem is we need fast recovery of a large NFS file share. For > this we ordered 2 supermicro servers. We outfitted them with 2 1tb drives > each in a local mirror. Yes I know this is not big by some people's > standards. Each machine has 2 gigs memory, and dual core cpus, and dual > nics. We hooked eth1 on each machine to the other. This gives us a nice > private gigabit ethernet link. This private link is what we configure > DRBD and Heartbeat to talk over. > > I partitioned each of the machines like this > sda1 4gb swap > sdb1 4gb swap > sda2 1gb /boot > sdb2 1gb /home > sda3 5gb / > sdb3 5gb /var > > The below two are joined as md0 as a raid1. > sda4 990gb > sdb4 990gb > > I then used DRBD to make md0 on each machine a raid 1 with the other > machine. And for a lack of a better place to mount drbd0, it is mounted > in /media/drbd. > > One of the tricks I have learned with Heartbeat is that there are many > services which have their configs in etc that you will want to be > available all the time. So I create an etc-shared and var-shared in > /media/drbd to store these configs and running information. I try my best > to recreate the directory structure there as well. Then I move the > configs from their normal location to this shared location and link the > old location there. > > For instance, the configuration for bacula(our backup solution), NFS > information, SSH config, and such all go to live on the DRBD drive. This > way if I make changes to whatever is the primary machine, the secondary > when promoted will use these same configs. > > NFS has /etc/exports that is of interest to us in our file server setup. > NFS also stores state information in /var/lib/nfs. When /var/lib/nfs is > accessible to the secondary machine as it is promoted, any running copies > or reads continue as if nothing happened. I have done the copy a ISO > image to the cluster and go pull the power cord, watch the failover, and > then the copy continues with a perfect copy according to md5sum being the > result. > > DRBD configuration itself is pretty simple for the normal case. I did > little modification of the Debian supplied default config. > > Heartbeat on the other hand took quite a bit of tweaking. Most of > Heartbeats config is specific to how you want it to talk to the other > machines and how fast you want it to determine the other side has gone > missing and not just is too busy to talk to you. > > The portion of Heartbeat that was of most importance to my project is the > haresources file. I'm still not fully up on the specs for it, but it > seems there are commands with colons in it that denote built in commands, > and the others are for init.d files to run. My config sets our local DRBD > drive as the primary in the cluster. This means we can now access it. It > mounts the filesystem. We then crank up nfs-common and nfs-kernel-server. > We also crank up bacula-fd so we can deal with backups. I also created a > new ssh init.d script such that it could fire up a cluster sshd with > configs located on the DRBD drive. We then finally fire up a aliased IP > address that is the one used for contacting the primary machine. > > I guess that points out as well that when you define a service as being > for the cluster, you need to either make a new init script that will > start the service on the cluster IP only, or you need to remove the rc > links to the init script. Removing the links keeps the service from > starting until it is able to use the cluster IP. > > While covered recently, the ssh problem I had was solved wonderfully by > having a sshd running tied to the static individual IPs of the machines > in the cluster while creating alternate configs and keys for the cluster > itself and starting the sshd with that config list when a machine became > the primary for the cluster. > > So now that I have given a quick primer on the tech and why I needed it, > I should give some more specs. > > Speed across the local drive even though it then has to copy the data to > the remote via GigE ethernet and write it there as well, is pretty > decent. I think we achieved near local drive performance, but that might > be just because the speed of writing to 2 local drives mirrored in > software is slowing it down about as much as copying to the remote > machine and having it write as well. Reads are done locally, so that > wasn't really affected. For us as a whole, the true speed limiter for the > results of the NFS mount is usually the internet connection anyways. So > the speeds are way faster than what our uplink speed is. > > Overall, Heartbeat was more of a pain than DRBD was to setup and tune > properly, but both are not all that hard to handle. Once you get into the > mindsets on how to use them, it becomes very easy to start expanding the > idea to get more things accomplished. And the fact that all of this was > accomplished with an off the shelf Debian install was loads of help. > > As for our exposure to failures; > > We started with a Dell 2450 with 2 P3 860mhz cpus, a gig of ram, and 4 > 73gb SCSI drives in a raid 5 array resulting in about 200gb of drive > space and only 2 drive failures away from full lost data needing a > restore from our backup solution. > > We now are working with 2 Supermicros with 2 gigs ram each, 2 3ghz cores > each, and 2 1tb drives raided together each. Now we need to lose all 4 > drives before we have to roll back to the backup solution. > > We had used 2u of space for the 2450, and we now use 2u of space for both > machines. Power usage is up a little due to the increased support > hardware for the drives, but our tolerance for failure has gone way up > before data loss. > > We may even look into doing some of these same things to other machines. > We have even thought about trying this out with Postgresql as a cheap > replication system. Essentially making sure that recovery not really any > different from power failure. In progress queries are lost, but a > reconnect and you are back running again. > > Oh, well. This was more of a ramble than a good write up. But maybe it > will give you some ideas as to what can be accomplished and make your own > installs more resilient to failure. > > > -- > Steven Critchfield [EMAIL PROTECTED] > > > ________________________________________________________ Mark J. Bailey Jobsoft Design & Development, Inc. 104 Arlington Place, Suite 100 Franklin, TN 37064 EMAIL: [EMAIL PROTECTED] WEB: http://www.jobsoft.com/ VOICE:(615)904-9559 FAX:(615)904-9576 CELL:(615)308-9099 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "NLUG" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/nlug-talk?hl=en -~----------~----~----~----~------~----~------~--~---
