Re: [Lustre-discuss] OST Failover Configuration (Active/Active) verification

Jeff Johnson Thu, 30 Jan 2014 21:58:46 -0800

Peter,

You have it about half right. Since you are dealing with active filesystems
on the OST shared storage devices and you can never truly predict the type
of failure a system will have you need to add a software command/control
layer that will manage failover, remounting storage and cutting power to
the failing node. Failure to do that could result in a split brain
situation where your OST backend filesystem gets corrupted.

You have the mkfs.lustre part right. You need to add heartbeat/corosync to
the configuration and configure it so the two systems monitor each other
with a watchdog heartbeat. Failing machine gets sensed by healthy machine
and the healthy machine shoots it (STONITH: shoot the other node in the
head) via ipmi power control or a smart rack PDU like an APC ethernet
managed PDU.

The heartbeat/corosync config takes your existing config and adds automated
directives like:
node1 mount sdb : node2 mounts sdc
if node1 dies node2 mounts sdb
if node2 dies node1 mounts sdc
if surviving node senses restoration of heartbeat local defined storage
gets remounted to owning node

Intel's IEEL Lustre distribution does all of this sort of thing
automagically. Or you can manually install Lustre and the corosync app
packages and configure it manually.

--Jeff

On Thu, Jan 30, 2014 at 9:21 PM, Peter Mistich
<peter.mist...@rackspace.com>wrote:

> hello,
>
> anyone here can answer a questions about OST Failover Configuration
> (Active/Active) I think I understand but want to make sure.
>
> I configure 2 oss  servernames = node1 and node2  with 2 shared drives
> /dev/sdb and /dev/sdc and  on node1
>
> I run the command on node1 mkfs.lustre --fsname=testfs --ost
> --failnode=node2 --mgsnode=msg /dev/sdb
>
> I run the command on node2 mkfs.lustre --fsname=testfs --ost
> --failnode=node1 --mgsnode=msg /dev/sdc
>
> I mount  /dev/sdb on node 1 and  mount /dev/sdc on node2
>
> if node1 fails then I just mount  /dev/sdb on node2 and that is how
> active/active works
>
> is this correct ?
>
> Thanks,
> Pete
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

-- 
------------------------------
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] OST Failover Configuration (Active/Active) verification

Reply via email to