Darby,

Do you mind if I inquire about the setup for your lustre systems?

I'm trying to understand how the MGS/MGT is setup for high availability.
I understand with OSTs and MDTs where all I really need is to have the failnode set when I do the mkfs.lustre However, as I understand it, you have to use something like pacemaker and drbd to deal with the MGS/MGT. Is this how you approached it?

Brian Andrus



On 2/6/2017 12:58 PM, Vicker, Darby (JSC-EG311) wrote:
Agreed.  We are just about to go into production on our next LFS with the
setup described.  We had to get past a bug in the MGS failover for
dual-homed servers but as of last week that is done and everything is
working great (see "MGS failover problem" thread on this mailing list from
this month and last).  We are in the process of syncing our existing LFS
to this new one and I've failed over/rebooted/upgraded the new LFS servers
many times now to make sure we can do this in practice when the new LFS goes
into production.  Its working beautifully.

Many thanks to the lustre developers for their continued efforts.  We have
been using and have been fans of lustre for quite some time now and it
just keeps getting better.

-----Original Message-----
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of Ben Evans 
<bev...@cray.com>
Date: Monday, February 6, 2017 at 2:22 PM
To: Brian Andrus <toomuc...@gmail.com>, "lustre-discuss@lists.lustre.org" 
<lustre-discuss@lists.lustre.org>
Subject: Re: [lustre-discuss] design to enable kernel updates

It's certainly possible.  When I've done that sort of thing, you upgrade
the OS on all the servers first, boot half of them (the A side) to the new
image, all the targets will fail over to the B servers.  Once the A side
is up, reboot the B half to the new OS.  Finally, do a failback to the
"normal" running state.

At least when I've done it, you'll want to do the failovers manually so
the HA infrastructure doesn't surprise you for any reason.

-Ben

On 2/6/17, 2:54 PM, "lustre-discuss on behalf of Brian Andrus"
<lustre-discuss-boun...@lists.lustre.org on behalf of toomuc...@gmail.com>
wrote:

All,

I have been contemplating how lustre could be configured such that I
could update the kernel on each server without downtime.

It seems this is _almost_ possible when you have a san system so you
have failover for OSTs and MDTs. BUT the MGS/MGT seems to be the
problematic one, since rebooting that seems cause downtime that cannot
be avoided.

If you have a system where the disks are physically part of the OSS
hardware, you are out of luck. The hypothetical scenario I am using is
if someone had a VM that was a qcow image on a lustre mount (basically
an active, open file being read/written to continuously). How could
lustre be built to ensure anyone on the VM would not notice a kernel
upgrade to the underlying lustre servers.


Could such a setup be done? It seems that would be a better use case for
something like GPFS or Gluster, but being a die-hard lustre enthusiast,
I want to at least show it could be done.


Thanks in advance,

Brian Andrus

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to