Please don't top-post, it makes the thread harder to read (ie, harder to
help you).
On 08/07/16 02:01, James Ault wrote:
-James Ault, http://www.linkedin.com/in/aultj/
http://tinyurl.com/link2jimault <http://tinyurl.com/link2jimault>
Life's Biggest Decision is... <http://www.bornofthespirit.today/>
On Thu, Jul 7, 2016 at 10:48 AM, Lars Ellenberg
<[email protected] <mailto:[email protected]>> wrote:
On Thu, Jul 07, 2016 at 07:16:51AM -0400, James Ault wrote:
> Here is a scenario:
>
> Two identical servers running RHEL 6.7,
> Three RAID5 targets, with one Logical volume group and one
logical volume
> defined on top of each target.
> A DRBD device defined on top of each logical volume, and then an
XFS file
> system defined on top of each DRBD device.
>
> The two identical servers are right on top of one another in the
rack, and
> connected by a single ethernet cable for a private network.
>
> The configuration works as far as synchronization between DRBD
devices.
>
> We do NOT have pacemaker as part of this configuration at
management's
> request.
>
> We have the XFS file system mounted on server1, and this file
system is
> exported via NFS.
>
> The difficulty lies in performing failover actions without pacemaker
> automation.
>
> The file system is mounted, and those status flags on the file
system are
> successfully mirrored to server2.
>
> If I disconnected all wires from server1 to simulate system
failure, and
> promoted server2 to primary on one of these file systems, and
attempted to
> mount it, the error displayed is "file system already mounted".
>
> I have searched the xfs_admin and mount man pages thoroughly to
find an
> option that would help me overcome this state.
>
> Our purpose of replication is to preserve and recover data in
case of
> failure, but we are unable to recover or use the secondary copy
in our
> current configuration.
>
> How can I recover and use this data without introducing
pacemaker to our
> configuration?
If you want to do manual failover (I believe we have that also
documented in the User's Guide), all you do is
drbdadm primary $res
mount /dev/drbdX /some/where
That's also exactly what pacemaker would do.
If that does not work,
you have it either "auto-mounted" already by something,
or you have some file system UUID conflict,
or something else is very wrong.
I see the Manual Failover section of the DRBD 8.4.x manual, and I see
that it requires that the file system be umounted before attempting to
promote and mount the file system on the secondary.
Assuming san1 is primary, and san2 is secondary.
If you want to do a "nice" failover:
a) on san1 stop whatever processes are "using" the filesystem (eg, NFS,
samba, etc...)
b) on san1 umount the filesystem
c) on san1 change DRBD resource to secondary
d) on san2 change DRBD resource to primary
e) on san2 mount the filesystem
f) on san2 start whatever processes to export the filesystem (eg NFS,
samba, etc)
What I meant by "those status flags" in my first message is that when
a node mounts a file system, that file system is marked as mounted
somewhere on that device. The "mounted" status flag is what I'm
trying to describe, and I'm not sure if I have the correct name for it.
Me neither, and I'm not familiar with XFS at all, however, the unclean
failover looks like this:
a) san1 crashes, on san2, it sees the remote is missing, and changes to
disconnected status
b) on san2 change DRBD resource to primary
c) on san2 mount the filesystem
d) on san2 start whatever processes to export the filesystem (eg NFS,
samba, etc)
As far as step (C), this would be an identical process as if you were
not using DRBD at all, and the machine had "crashed", and you had
rebooted it, and were now trying to mount the FS. ie, its just a
standard unclean mount. Maybe you need to run a fsck first, maybe there
is some other process, but generally, most FS's I've used, you simply
mount it and it will either "clean up" (if it is a journal based FS) or
continue as normal until it encounters some corruption/error.
Does pacemaker or manual failover handle the case where a file server
experiences a hard failure where the umount operation is impossible?
How can the secondary copy of the file system be mounted if the
umount operation never occurred and cannot occur on server1?
Yes, pacemaker simply automates the above processes, so that the
decision to do the failover, and the actual failover process will happen
more quickly (hopefully before your clients/services notice any
interruption).
BTW, have you actually tried it yet? You should definitely test a number
of scenarios, so if you have a scenario with a specific problem, please
provide a description of what you did, what commands you tried, and the
output of those commands so we can provide better information.
Hope that helps...
--
Adam Goryachev Website Managers www.websitemanagers.com.au
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user