Thanks for the write up.  This is a good introduction to the subject and 
made me think about some things I hadn't considered before.

Chris


Steven S. Critchfield wrote:
> I was hoping to wait till I put this in production before writing it up, but 
> that is still a little while off and it seems some of this information is 
> needed for you all.
>
> DRBD is Distributed Redundant Block Device.
>
> Block devices are those things you tend to deal with in big chunks like disk 
> drives. Linux allows you to do some interesting things with block devices. 
> Specifically, there are some layers that do some type of work and export a 
> block looking device for you to use.
>
> Many of you have probably used the loopback device. This is a piece of 
> software that takes a file and makes it look like a block device you can 
> mount and even put files into. 
>
> Some may even be aware of the MD software raid driver. This makes more than 
> one drive look like a single drive.
>
> Probably fewer but becoming a larger group know about LVM and the ability to 
> also make multiple drives look like a single block device.
>
> DRBD is similar in that it exports a block device for you to use. The munging 
> it does though is in making a copy of the underlying storage on a different 
> machine. DRBD is Raid mirroring by way of getting the data off to another 
> machine completely.
>
> DRBD on it's own is nice. You get your data backed up in realtime and you 
> don't run the risk of the PSU dieing in some spectacular way taking out more 
> than one drive of your raid array and destroying your working data set.
>
> Where DRBD becomes a great thing to have is when you combine it with 
> Heartbeat. Heartbeat will essentially ping the remote machine to verify it is 
> up and behaving. If it is down, it will take possession of the IP you are 
> exporting services on, fire up a configured list of services, and convert 
> your local copy of the DRBD drive to primary and mount it up to continue 
> doing what ever the server is meant to be doing.
>
> In our company, we had the need to have a quicker than our restore time from 
> backups recovery of data. The machine that was of most importance for this is 
> basically just a big fileserver with NFS running on it. Our push to find 
> something for it came from a botched diagnosis of our problems. But in this 
> botched diagnosis, some of our attempted solutions showed we hadn't thought 
> our way through this kind of failure and recovery. So in steps the fruits of 
> our research, DRBD.
>
> So, our problem is we need fast recovery of a large NFS file share. For this 
> we ordered 2 supermicro servers. We outfitted them with 2 1tb drives each in 
> a local mirror. Yes I know this is not big by some people's standards. Each 
> machine has 2 gigs memory, and dual core cpus, and dual nics. We hooked eth1 
> on each machine to the other. This gives us a nice private gigabit ethernet 
> link. This private link is what we configure DRBD and Heartbeat to talk over.
>
> I partitioned each of the machines like this
> sda1 4gb swap
> sdb1 4gb swap
> sda2 1gb /boot
> sdb2 1gb /home
> sda3 5gb /
> sdb3 5gb /var
>
> The below two are joined as md0 as a raid1.
> sda4 990gb 
> sdb4 990gb
>
> I then used DRBD to make md0 on each machine a raid 1 with the other machine. 
> And for a lack of a better place to mount drbd0, it is mounted in /media/drbd.
>
> One of the tricks I have learned with Heartbeat is that there are many 
> services which have their configs in etc that you will want to be available 
> all the time. So I create an etc-shared and var-shared in /media/drbd to 
> store these configs and running information. I try my best to recreate the 
> directory structure there as well. Then I move the configs from their normal 
> location to this shared location and link the old location there. 
>
> For instance, the configuration for bacula(our backup solution), NFS 
> information, SSH config, and such all go to live on the DRBD drive. This way 
> if I make changes to whatever is the primary machine, the secondary when 
> promoted will use these same configs. 
>
> NFS has /etc/exports that is of interest to us in our file server setup. NFS 
> also stores state information in /var/lib/nfs. When /var/lib/nfs is 
> accessible to the secondary machine as it is promoted, any running copies or 
> reads continue as if nothing happened. I have done the copy a ISO image to 
> the cluster and go pull the power cord, watch the failover, and then the copy 
> continues with a perfect copy according to md5sum being the result.
>
> DRBD configuration itself is pretty simple for the normal case. I did little 
> modification of the Debian supplied default config.
>
> Heartbeat on the other hand took quite a bit of tweaking. Most of Heartbeats 
> config is specific to how you want it to talk to the other machines and how 
> fast you want it to determine the other side has gone missing and not just is 
> too busy to talk to you. 
>
> The portion of Heartbeat that was of most importance to my project is the 
> haresources file. I'm still not fully up on the specs for it, but it seems 
> there are commands with colons in it that denote built in commands, and the 
> others are for init.d files to run. My config sets our local DRBD drive as 
> the primary in the cluster. This means we can now access it. It mounts the 
> filesystem. We then crank up nfs-common and nfs-kernel-server. We also crank 
> up bacula-fd so we can deal with backups. I also created a new ssh init.d 
> script such that it could fire up a cluster sshd with configs located on the 
> DRBD drive. We then finally fire up a aliased IP address that is the one used 
> for contacting the primary machine.
>
> I guess that points out as well that when you define a service as being for 
> the cluster, you need to either make a new init script that will start the 
> service on the cluster IP only, or you need to remove the rc links to the 
> init script. Removing the links keeps the service from starting until it is 
> able to use the cluster IP. 
>
> While covered recently, the ssh problem I had was solved wonderfully by 
> having a sshd running tied to the static individual IPs of the machines in 
> the cluster while creating alternate configs and keys for the cluster itself 
> and starting the sshd with that config list when a machine became the primary 
> for the cluster.
>
> So now that I have given a quick primer on the tech and why I needed it, I 
> should give some more specs.
>
> Speed across the local drive even though it then has to copy the data to the 
> remote via GigE ethernet and write it there as well, is pretty decent. I 
> think we achieved near local drive performance, but that might be just 
> because the speed of writing to 2 local drives mirrored in software is 
> slowing it down about as much as copying to the remote machine and having it 
> write as well. Reads are done locally, so that wasn't really affected. For us 
> as a whole, the true speed limiter for the results of the NFS mount is 
> usually the internet connection anyways. So the speeds are way faster than 
> what our uplink speed is.
>
> Overall, Heartbeat was more of a pain than DRBD was to setup and tune 
> properly, but both are not all that hard to handle. Once you get into the 
> mindsets on how to use them, it becomes very easy to start expanding the idea 
> to get more things accomplished. And the fact that all of this was 
> accomplished with an off the shelf Debian install was loads of help.
>
> As for our exposure to failures;
>
> We started with a Dell 2450 with 2 P3 860mhz cpus, a gig of ram, and 4 73gb 
> SCSI drives in a raid 5 array resulting in about 200gb of drive space and 
> only 2 drive failures away from full lost data needing a restore from our 
> backup solution.
>
> We now are working with 2 Supermicros with 2 gigs ram each, 2 3ghz cores 
> each, and 2 1tb drives raided together each. Now we need to lose all 4 drives 
> before we have to roll back to the backup solution. 
>
> We had used 2u of space for the 2450, and we now use 2u of space for both 
> machines. Power usage is up a little due to the increased support hardware 
> for the drives, but our tolerance for failure has gone way up before data 
> loss.
>
> We may even look into doing some of these same things to other machines. We 
> have even thought about trying this out with Postgresql as a cheap 
> replication system. Essentially making sure that recovery not really any 
> different from power failure. In progress queries are lost, but a reconnect 
> and you are back running again.
>
> Oh, well. This was more of a ramble than a good write up. But maybe it will 
> give you some ideas as to what can be accomplished and make your own installs 
> more resilient to failure.
>
>   
>   

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"NLUG" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/nlug-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to