Please don't top-post, it makes the thread harder to read (ie, harder to help you).

On 08/07/16 02:01, James Ault wrote:





-James Ault, http://www.linkedin.com/in/aultj/ http://tinyurl.com/link2jimault <http://tinyurl.com/link2jimault>
Life's Biggest Decision is... <http://www.bornofthespirit.today/>


On Thu, Jul 7, 2016 at 10:48 AM, Lars Ellenberg <[email protected] <mailto:[email protected]>> wrote:

    On Thu, Jul 07, 2016 at 07:16:51AM -0400, James Ault wrote:
    > Here is a scenario:
    >
    > Two identical servers running RHEL 6.7,
    > Three RAID5 targets, with one Logical volume group and one
    logical volume
    > defined on top of each target.
    > A DRBD device defined on top of each logical volume, and then an
    XFS file
    > system defined on top of each DRBD device.
    >
    > The two identical servers are right on top of one another in the
    rack, and
    > connected by a single ethernet cable for a private network.
    >
    > The configuration works as far as synchronization between DRBD
    devices.
    >
    > We do NOT have pacemaker as part of this configuration at
    management's
    > request.
    >
    > We have the XFS file system mounted on server1, and this file
    system is
    > exported via NFS.
    >
    > The difficulty lies in performing failover actions without pacemaker
    > automation.
    >
    > The file system is mounted, and those status flags on the file
    system are
    > successfully mirrored to server2.
    >
    > If I disconnected all wires from server1 to simulate system
    failure, and
    > promoted server2 to primary on one of these file systems, and
    attempted to
    > mount it, the error displayed is "file system already mounted".
    >
    > I have searched the xfs_admin and mount man pages thoroughly to
    find an
    > option that would help me overcome this state.
    >
    > Our purpose of replication is to preserve and recover data in
    case of
    > failure, but we are unable to recover or use the secondary copy
    in our
    > current configuration.
    >
    > How can I recover and use this data without introducing
    pacemaker to our
    > configuration?

    If you want to do manual failover (I believe we have that also
    documented in the User's Guide), all you do is

    drbdadm primary $res
    mount /dev/drbdX /some/where

    That's also exactly what pacemaker would do.

    If that does not work,
    you have it either "auto-mounted" already by something,
    or you have some file system UUID conflict,
    or something else is very wrong.


I see the Manual Failover section of the DRBD 8.4.x manual, and I see that it requires that the file system be umounted before attempting to promote and mount the file system on the secondary.

Assuming san1 is primary, and san2 is secondary.

If you want to do a "nice" failover:
a) on san1 stop whatever processes are "using" the filesystem (eg, NFS, samba, etc...)
b) on san1 umount the filesystem
c) on san1 change DRBD resource to secondary
d) on san2 change DRBD resource to primary
e) on san2 mount the filesystem
f) on san2 start whatever processes to export the filesystem (eg NFS, samba, etc)

What I meant by "those status flags" in my first message is that when a node mounts a file system, that file system is marked as mounted somewhere on that device. The "mounted" status flag is what I'm trying to describe, and I'm not sure if I have the correct name for it.

Me neither, and I'm not familiar with XFS at all, however, the unclean failover looks like this:

a) san1 crashes, on san2, it sees the remote is missing, and changes to disconnected status
b) on san2 change DRBD resource to primary
c) on san2 mount the filesystem
d) on san2 start whatever processes to export the filesystem (eg NFS, samba, etc)

As far as step (C), this would be an identical process as if you were not using DRBD at all, and the machine had "crashed", and you had rebooted it, and were now trying to mount the FS. ie, its just a standard unclean mount. Maybe you need to run a fsck first, maybe there is some other process, but generally, most FS's I've used, you simply mount it and it will either "clean up" (if it is a journal based FS) or continue as normal until it encounters some corruption/error.
Does pacemaker or manual failover handle the case where a file server experiences a hard failure where the umount operation is impossible? How can the secondary copy of the file system be mounted if the umount operation never occurred and cannot occur on server1?

Yes, pacemaker simply automates the above processes, so that the decision to do the failover, and the actual failover process will happen more quickly (hopefully before your clients/services notice any interruption).

BTW, have you actually tried it yet? You should definitely test a number of scenarios, so if you have a scenario with a specific problem, please provide a description of what you did, what commands you tried, and the output of those commands so we can provide better information.

Hope that helps...

--
Adam Goryachev Website Managers www.websitemanagers.com.au
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to