Steven,

A hearty thanks for this!  I had been starting to once again look into this 
also for possible use at a customer's site in Brentwood.  This is a welcome 
recap from another "known" (!) perspective!  :-)

Thanks again!

Mark

--On Monday, October 13, 2008 2:18 PM -0500 "Steven S. Critchfield" 
<[EMAIL PROTECTED]> wrote:

>
> I was hoping to wait till I put this in production before writing it up,
> but that is still a little while off and it seems some of this
> information is needed for you all.
>
> DRBD is Distributed Redundant Block Device.
>
> Block devices are those things you tend to deal with in big chunks like
> disk drives. Linux allows you to do some interesting things with block
> devices. Specifically, there are some layers that do some type of work
> and export a block looking device for you to use.
>
> Many of you have probably used the loopback device. This is a piece of
> software that takes a file and makes it look like a block device you can
> mount and even put files into.
>
> Some may even be aware of the MD software raid driver. This makes more
> than one drive look like a single drive.
>
> Probably fewer but becoming a larger group know about LVM and the ability
> to also make multiple drives look like a single block device.
>
> DRBD is similar in that it exports a block device for you to use. The
> munging it does though is in making a copy of the underlying storage on a
> different machine. DRBD is Raid mirroring by way of getting the data off
> to another machine completely.
>
> DRBD on it's own is nice. You get your data backed up in realtime and you
> don't run the risk of the PSU dieing in some spectacular way taking out
> more than one drive of your raid array and destroying your working data
> set.
>
> Where DRBD becomes a great thing to have is when you combine it with
> Heartbeat. Heartbeat will essentially ping the remote machine to verify
> it is up and behaving. If it is down, it will take possession of the IP
> you are exporting services on, fire up a configured list of services, and
> convert your local copy of the DRBD drive to primary and mount it up to
> continue doing what ever the server is meant to be doing.
>
> In our company, we had the need to have a quicker than our restore time
> from backups recovery of data. The machine that was of most importance
> for this is basically just a big fileserver with NFS running on it. Our
> push to find something for it came from a botched diagnosis of our
> problems. But in this botched diagnosis, some of our attempted solutions
> showed we hadn't thought our way through this kind of failure and
> recovery. So in steps the fruits of our research, DRBD.
>
> So, our problem is we need fast recovery of a large NFS file share. For
> this we ordered 2 supermicro servers. We outfitted them with 2 1tb drives
> each in a local mirror. Yes I know this is not big by some people's
> standards. Each machine has 2 gigs memory, and dual core cpus, and dual
> nics. We hooked eth1 on each machine to the other. This gives us a nice
> private gigabit ethernet link. This private link is what we configure
> DRBD and Heartbeat to talk over.
>
> I partitioned each of the machines like this
> sda1 4gb swap
> sdb1 4gb swap
> sda2 1gb /boot
> sdb2 1gb /home
> sda3 5gb /
> sdb3 5gb /var
>
> The below two are joined as md0 as a raid1.
> sda4 990gb
> sdb4 990gb
>
> I then used DRBD to make md0 on each machine a raid 1 with the other
> machine. And for a lack of a better place to mount drbd0, it is mounted
> in /media/drbd.
>
> One of the tricks I have learned with Heartbeat is that there are many
> services which have their configs in etc that you will want to be
> available all the time. So I create an etc-shared and var-shared in
> /media/drbd to store these configs and running information. I try my best
> to recreate the directory structure there as well. Then I move the
> configs from their normal location to this shared location and link the
> old location there.
>
> For instance, the configuration for bacula(our backup solution), NFS
> information, SSH config, and such all go to live on the DRBD drive. This
> way if I make changes to whatever is the primary machine, the secondary
> when promoted will use these same configs.
>
> NFS has /etc/exports that is of interest to us in our file server setup.
> NFS also stores state information in /var/lib/nfs. When /var/lib/nfs is
> accessible to the secondary machine as it is promoted, any running copies
> or reads continue as if nothing happened. I have done the copy a ISO
> image to the cluster and go pull the power cord, watch the failover, and
> then the copy continues with a perfect copy according to md5sum being the
> result.
>
> DRBD configuration itself is pretty simple for the normal case. I did
> little modification of the Debian supplied default config.
>
> Heartbeat on the other hand took quite a bit of tweaking. Most of
> Heartbeats config is specific to how you want it to talk to the other
> machines and how fast you want it to determine the other side has gone
> missing and not just is too busy to talk to you.
>
> The portion of Heartbeat that was of most importance to my project is the
> haresources file. I'm still not fully up on the specs for it, but it
> seems there are commands with colons in it that denote built in commands,
> and the others are for init.d files to run. My config sets our local DRBD
> drive as the primary in the cluster. This means we can now access it. It
> mounts the filesystem. We then crank up nfs-common and nfs-kernel-server.
> We also crank up bacula-fd so we can deal with backups. I also created a
> new ssh init.d script such that it could fire up a cluster sshd with
> configs located on the DRBD drive. We then finally fire up a aliased IP
> address that is the one used for contacting the primary machine.
>
> I guess that points out as well that when you define a service as being
> for the cluster, you need to either make a new init script that will
> start the service on the cluster IP only, or you need to remove the rc
> links to the init script. Removing the links keeps the service from
> starting until it is able to use the cluster IP.
>
> While covered recently, the ssh problem I had was solved wonderfully by
> having a sshd running tied to the static individual IPs of the machines
> in the cluster while creating alternate configs and keys for the cluster
> itself and starting the sshd with that config list when a machine became
> the primary for the cluster.
>
> So now that I have given a quick primer on the tech and why I needed it,
> I should give some more specs.
>
> Speed across the local drive even though it then has to copy the data to
> the remote via GigE ethernet and write it there as well, is pretty
> decent. I think we achieved near local drive performance, but that might
> be just because the speed of writing to 2 local drives mirrored in
> software is slowing it down about as much as copying to the remote
> machine and having it write as well. Reads are done locally, so that
> wasn't really affected. For us as a whole, the true speed limiter for the
> results of the NFS mount is usually the internet connection anyways. So
> the speeds are way faster than what our uplink speed is.
>
> Overall, Heartbeat was more of a pain than DRBD was to setup and tune
> properly, but both are not all that hard to handle. Once you get into the
> mindsets on how to use them, it becomes very easy to start expanding the
> idea to get more things accomplished. And the fact that all of this was
> accomplished with an off the shelf Debian install was loads of help.
>
> As for our exposure to failures;
>
> We started with a Dell 2450 with 2 P3 860mhz cpus, a gig of ram, and 4
> 73gb SCSI drives in a raid 5 array resulting in about 200gb of drive
> space and only 2 drive failures away from full lost data needing a
> restore from our backup solution.
>
> We now are working with 2 Supermicros with 2 gigs ram each, 2 3ghz cores
> each, and 2 1tb drives raided together each. Now we need to lose all 4
> drives before we have to roll back to the backup solution.
>
> We had used 2u of space for the 2450, and we now use 2u of space for both
> machines. Power usage is up a little due to the increased support
> hardware for the drives, but our tolerance for failure has gone way up
> before data loss.
>
> We may even look into doing some of these same things to other machines.
> We have even thought about trying this out with Postgresql as a cheap
> replication system. Essentially making sure that recovery not really any
> different from power failure. In progress queries are lost, but a
> reconnect and you are back running again.
>
> Oh, well. This was more of a ramble than a good write up. But maybe it
> will give you some ideas as to what can be accomplished and make your own
> installs more resilient to failure.
>
>
> --
> Steven Critchfield [EMAIL PROTECTED]
>
> >



________________________________________________________
Mark J. Bailey        Jobsoft Design & Development, Inc.
104 Arlington Place, Suite 100        Franklin, TN 37064
EMAIL: [EMAIL PROTECTED]      WEB: http://www.jobsoft.com/
VOICE:(615)904-9559 FAX:(615)904-9576 CELL:(615)308-9099


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"NLUG" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/nlug-talk?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to