On Mon, Oct 13, 2008 at 3:13 PM, Mark J Bailey <[EMAIL PROTECTED]> wrote: > > Steven, > > A hearty thanks for this! I had been starting to once again look into this > also for possible use at a customer's site in Brentwood. This is a welcome > recap from another "known" (!) perspective! :-) > > Thanks again! > > Mark > > --On Monday, October 13, 2008 2:18 PM -0500 "Steven S. Critchfield" > <[EMAIL PROTECTED]> wrote: > >> >> I was hoping to wait till I put this in production before writing it up, >> but that is still a little while off and it seems some of this >> information is needed for you all. >> >> DRBD is Distributed Redundant Block Device. >> >> Block devices are those things you tend to deal with in big chunks like >> disk drives. Linux allows you to do some interesting things with block >> devices. Specifically, there are some layers that do some type of work >> and export a block looking device for you to use. >> >> Many of you have probably used the loopback device. This is a piece of >> software that takes a file and makes it look like a block device you can >> mount and even put files into. >> >> Some may even be aware of the MD software raid driver. This makes more >> than one drive look like a single drive. >> >> Probably fewer but becoming a larger group know about LVM and the ability >> to also make multiple drives look like a single block device. >> >> DRBD is similar in that it exports a block device for you to use. The >> munging it does though is in making a copy of the underlying storage on a >> different machine. DRBD is Raid mirroring by way of getting the data off >> to another machine completely. >> >> DRBD on it's own is nice. You get your data backed up in realtime and you >> don't run the risk of the PSU dieing in some spectacular way taking out >> more than one drive of your raid array and destroying your working data >> set. >> >> Where DRBD becomes a great thing to have is when you combine it with >> Heartbeat. Heartbeat will essentially ping the remote machine to verify >> it is up and behaving. If it is down, it will take possession of the IP >> you are exporting services on, fire up a configured list of services, and >> convert your local copy of the DRBD drive to primary and mount it up to >> continue doing what ever the server is meant to be doing. >> >> In our company, we had the need to have a quicker than our restore time >> from backups recovery of data. The machine that was of most importance >> for this is basically just a big fileserver with NFS running on it. Our >> push to find something for it came from a botched diagnosis of our >> problems. But in this botched diagnosis, some of our attempted solutions >> showed we hadn't thought our way through this kind of failure and >> recovery. So in steps the fruits of our research, DRBD. >> >> So, our problem is we need fast recovery of a large NFS file share. For >> this we ordered 2 supermicro servers. We outfitted them with 2 1tb drives >> each in a local mirror. Yes I know this is not big by some people's >> standards. Each machine has 2 gigs memory, and dual core cpus, and dual >> nics. We hooked eth1 on each machine to the other. This gives us a nice >> private gigabit ethernet link. This private link is what we configure >> DRBD and Heartbeat to talk over. >> >> I partitioned each of the machines like this >> sda1 4gb swap >> sdb1 4gb swap >> sda2 1gb /boot >> sdb2 1gb /home >> sda3 5gb / >> sdb3 5gb /var >> >> The below two are joined as md0 as a raid1. >> sda4 990gb >> sdb4 990gb >> >> I then used DRBD to make md0 on each machine a raid 1 with the other >> machine. And for a lack of a better place to mount drbd0, it is mounted >> in /media/drbd. >> >> One of the tricks I have learned with Heartbeat is that there are many >> services which have their configs in etc that you will want to be >> available all the time. So I create an etc-shared and var-shared in >> /media/drbd to store these configs and running information. I try my best >> to recreate the directory structure there as well. Then I move the >> configs from their normal location to this shared location and link the >> old location there. >> >> For instance, the configuration for bacula(our backup solution), NFS >> information, SSH config, and such all go to live on the DRBD drive. This >> way if I make changes to whatever is the primary machine, the secondary >> when promoted will use these same configs. >> >> NFS has /etc/exports that is of interest to us in our file server setup. >> NFS also stores state information in /var/lib/nfs. When /var/lib/nfs is >> accessible to the secondary machine as it is promoted, any running copies >> or reads continue as if nothing happened. I have done the copy a ISO >> image to the cluster and go pull the power cord, watch the failover, and >> then the copy continues with a perfect copy according to md5sum being the >> result. >> >> DRBD configuration itself is pretty simple for the normal case. I did >> little modification of the Debian supplied default config. >> >> Heartbeat on the other hand took quite a bit of tweaking. Most of >> Heartbeats config is specific to how you want it to talk to the other >> machines and how fast you want it to determine the other side has gone >> missing and not just is too busy to talk to you. >> >> The portion of Heartbeat that was of most importance to my project is the >> haresources file. I'm still not fully up on the specs for it, but it >> seems there are commands with colons in it that denote built in commands, >> and the others are for init.d files to run. My config sets our local DRBD >> drive as the primary in the cluster. This means we can now access it. It >> mounts the filesystem. We then crank up nfs-common and nfs-kernel-server. >> We also crank up bacula-fd so we can deal with backups. I also created a >> new ssh init.d script such that it could fire up a cluster sshd with >> configs located on the DRBD drive. We then finally fire up a aliased IP >> address that is the one used for contacting the primary machine. >> >> I guess that points out as well that when you define a service as being >> for the cluster, you need to either make a new init script that will >> start the service on the cluster IP only, or you need to remove the rc >> links to the init script. Removing the links keeps the service from >> starting until it is able to use the cluster IP. >> >> While covered recently, the ssh problem I had was solved wonderfully by >> having a sshd running tied to the static individual IPs of the machines >> in the cluster while creating alternate configs and keys for the cluster >> itself and starting the sshd with that config list when a machine became >> the primary for the cluster. >> >> So now that I have given a quick primer on the tech and why I needed it, >> I should give some more specs. >> >> Speed across the local drive even though it then has to copy the data to >> the remote via GigE ethernet and write it there as well, is pretty >> decent. I think we achieved near local drive performance, but that might >> be just because the speed of writing to 2 local drives mirrored in >> software is slowing it down about as much as copying to the remote >> machine and having it write as well. Reads are done locally, so that >> wasn't really affected. For us as a whole, the true speed limiter for the >> results of the NFS mount is usually the internet connection anyways. So >> the speeds are way faster than what our uplink speed is. >> >> Overall, Heartbeat was more of a pain than DRBD was to setup and tune >> properly, but both are not all that hard to handle. Once you get into the >> mindsets on how to use them, it becomes very easy to start expanding the >> idea to get more things accomplished. And the fact that all of this was >> accomplished with an off the shelf Debian install was loads of help. >> >> As for our exposure to failures; >> >> We started with a Dell 2450 with 2 P3 860mhz cpus, a gig of ram, and 4 >> 73gb SCSI drives in a raid 5 array resulting in about 200gb of drive >> space and only 2 drive failures away from full lost data needing a >> restore from our backup solution. >> >> We now are working with 2 Supermicros with 2 gigs ram each, 2 3ghz cores >> each, and 2 1tb drives raided together each. Now we need to lose all 4 >> drives before we have to roll back to the backup solution. >> >> We had used 2u of space for the 2450, and we now use 2u of space for both >> machines. Power usage is up a little due to the increased support >> hardware for the drives, but our tolerance for failure has gone way up >> before data loss. >> >> We may even look into doing some of these same things to other machines. >> We have even thought about trying this out with Postgresql as a cheap >> replication system. Essentially making sure that recovery not really any >> different from power failure. In progress queries are lost, but a >> reconnect and you are back running again. >> >> Oh, well. This was more of a ramble than a good write up. But maybe it >> will give you some ideas as to what can be accomplished and make your own >> installs more resilient to failure. >> >> >> -- >> Steven Critchfield [EMAIL PROTECTED] >> >> > > > > > ________________________________________________________ > Mark J. Bailey Jobsoft Design & Development, Inc. > 104 Arlington Place, Suite 100 Franklin, TN 37064 > EMAIL: [EMAIL PROTECTED] WEB: http://www.jobsoft.com/ > VOICE:(615)904-9559 FAX:(615)904-9576 CELL:(615)308-9099 >
I thought it was a pretty good introductory writeup. It doesn't directly influence anything that I'm working on right now, but it was nice to see a general description and application of some of the programs you're using. $Thanks++ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "NLUG" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/nlug-talk?hl=en -~----------~----~----~----~------~----~------~--~---
