Re: [DRBD-user] DRBD Recovery actions without Pacemaker

Adam Goryachev Thu, 07 Jul 2016 17:03:07 -0700

Please don't top-post, it makes the thread harder to read (ie, harder tohelp you).


On 08/07/16 02:01, James Ault wrote:

-James Ault, http://www.linkedin.com/in/aultj/http://tinyurl.com/link2jimault <http://tinyurl.com/link2jimault>

Life's Biggest Decision is... <http://www.bornofthespirit.today/>

On Thu, Jul 7, 2016 at 10:48 AM, Lars Ellenberg<[email protected] <mailto:[email protected]>> wrote:


    On Thu, Jul 07, 2016 at 07:16:51AM -0400, James Ault wrote:
    > Here is a scenario:
    >
    > Two identical servers running RHEL 6.7,
    > Three RAID5 targets, with one Logical volume group and one
    logical volume
    > defined on top of each target.
    > A DRBD device defined on top of each logical volume, and then an
    XFS file
    > system defined on top of each DRBD device.
    >
    > The two identical servers are right on top of one another in the
    rack, and
    > connected by a single ethernet cable for a private network.
    >
    > The configuration works as far as synchronization between DRBD
    devices.
    >
    > We do NOT have pacemaker as part of this configuration at
    management's
    > request.
    >
    > We have the XFS file system mounted on server1, and this file
    system is
    > exported via NFS.
    >
    > The difficulty lies in performing failover actions without pacemaker
    > automation.
    >
    > The file system is mounted, and those status flags on the file
    system are
    > successfully mirrored to server2.
    >
    > If I disconnected all wires from server1 to simulate system
    failure, and
    > promoted server2 to primary on one of these file systems, and
    attempted to
    > mount it, the error displayed is "file system already mounted".
    >
    > I have searched the xfs_admin and mount man pages thoroughly to
    find an
    > option that would help me overcome this state.
    >
    > Our purpose of replication is to preserve and recover data in
    case of
    > failure, but we are unable to recover or use the secondary copy
    in our
    > current configuration.
    >
    > How can I recover and use this data without introducing
    pacemaker to our
    > configuration?

    If you want to do manual failover (I believe we have that also
    documented in the User's Guide), all you do is

    drbdadm primary $res
    mount /dev/drbdX /some/where

    That's also exactly what pacemaker would do.

    If that does not work,
    you have it either "auto-mounted" already by something,
    or you have some file system UUID conflict,
    or something else is very wrong.

I see the Manual Failover section of the DRBD 8.4.x manual, and I seethat it requires that the file system be umounted before attempting topromote and mount the file system on the secondary.

Assuming san1 is primary, and san2 is secondary.

If you want to do a "nice" failover:

a) on san1 stop whatever processes are "using" the filesystem (eg, NFS,samba, etc...)

b) on san1 umount the filesystem
c) on san1 change DRBD resource to secondary
d) on san2 change DRBD resource to primary
e) on san2 mount the filesystem

f) on san2 start whatever processes to export the filesystem (eg NFS,samba, etc)

What I meant by "those status flags" in my first message is that whena node mounts a file system, that file system is marked as mountedsomewhere on that device. The "mounted" status flag is what I'mtrying to describe, and I'm not sure if I have the correct name for it.

Me neither, and I'm not familiar with XFS at all, however, the uncleanfailover looks like this:

a) san1 crashes, on san2, it sees the remote is missing, and changes todisconnected status

b) on san2 change DRBD resource to primary
c) on san2 mount the filesystem

d) on san2 start whatever processes to export the filesystem (eg NFS,samba, etc)

As far as step (C), this would be an identical process as if you werenot using DRBD at all, and the machine had "crashed", and you hadrebooted it, and were now trying to mount the FS. ie, its just astandard unclean mount. Maybe you need to run a fsck first, maybe thereis some other process, but generally, most FS's I've used, you simplymount it and it will either "clean up" (if it is a journal based FS) orcontinue as normal until it encounters some corruption/error.

Does pacemaker or manual failover handle the case where a file serverexperiences a hard failure where the umount operation is impossible?How can the secondary copy of the file system be mounted if theumount operation never occurred and cannot occur on server1?

Yes, pacemaker simply automates the above processes, so that thedecision to do the failover, and the actual failover process will happenmore quickly (hopefully before your clients/services notice anyinterruption).

BTW, have you actually tried it yet? You should definitely test a numberof scenarios, so if you have a scenario with a specific problem, pleaseprovide a description of what you did, what commands you tried, and theoutput of those commands so we can provide better information.


Hope that helps...

--
Adam Goryachev Website Managers www.websitemanagers.com.au

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD Recovery actions without Pacemaker

Reply via email to