You're also leaving out the corosync/pacemaker/stonith configuration. That is unless you are doing manual export/import of pools.
On Fri, Feb 10, 2017 at 9:03 PM, Vicker, Darby (JSC-EG311) < darby.vicke...@nasa.gov> wrote: > Sure. Our hardware is very similar to this: > > https://www.supermicro.com/solutions/Lustre.cfm > > We are using twin servers instead two single chassis servers as shown > there but functionally this is the same – we can just fit more stuff into a > single rack with the twin servers. We are using a single JBOB per twin > server as shown in one of the configurations on the above page and are > using ZFS as the backend. All servers are dual-homed on both Ethernet and > IB. A combined MGS/MDS is at 10.148.0.30 address for IB and X.X.98.30 for > Ethernet. The secondary MDS/MGS on the .31 address for both networks. With > the combined MDS/MGS, they both fail over together. This did require a > patch from LU-8397 to get the MGS failover to work properly so we are using > 2.9.0 with the LU-8397 patch and are compiling our own server rpms. But > this is pretty simple with ZFS since you don't need a patched kernel. The > lustre formatting and configuration bits are below. I'm leaving out the > ZFS pool creation but I think you get the idea. > > I hope that helps. > > Darby > > > > if [[ $HOSTNAME == *mds* ]] ; then > > mkfs.lustre \ > --fsname=hpfs-fsl \ > --backfstype=zfs \ > --reformat \ > --verbose \ > --mgs --mdt --index=0 \ > --servicenode=${LUSTRE_LOCAL_TCP_IP}@tcp0,${LUSTRE_LOCAL_ > IB_IP}@o2ib0 \ > --servicenode=${LUSTRE_PEER_TCP_IP}@tcp0,${LUSTRE_PEER_IB_ > IP}@o2ib0 \ > metadata/meta-fst > > elif [[ $HOSTNAME == *oss* ]] ; then > > num=`hostname --short | sed 's/hpfs-fsl-//' | sed 's/oss//'` > num=`printf '%g' $num` > > mkfs.lustre \ > --mgsnode=X.X.98.30@tcp0,10.148.0.30@o2ib0 \ > --mgsnode=X.X.98.31@tcp0,10.148.0.31@o2ib0 \ > --fsname=hpfs-fsl \ > --backfstype=zfs \ > --reformat \ > --verbose \ > --ost --index=$num \ > --servicenode=${LUSTRE_LOCAL_TCP_IP}@tcp0,${LUSTRE_LOCAL_ > IB_IP}@o2ib0 \ > --servicenode=${LUSTRE_PEER_TCP_IP}@tcp0,${LUSTRE_PEER_IB_IP}@o2ib0 > \ > $pool/ost-fsl > fi > > > > > /etc/ldev.conf: > > #local foreign/- label [md|zfs:]device-path [journal-path]/- > [raidtab] > > hpfs-fsl-mds0 hpfs-fsl-mds1 hpfs-fsl-MDT0000 zfs:metadata/meta-fsl > > hpfs-fsl-oss00 hpfs-fsl-oss01 hpfs-fsl-OST0000 zfs:oss00-0/ost-fsl > hpfs-fsl-oss01 hpfs-fsl-oss00 hpfs-fsl-OST0001 zfs:oss01-0/ost-fsl > hpfs-fsl-oss02 hpfs-fsl-oss03 hpfs-fsl-OST0002 zfs:oss02-0/ost-fsl > hpfs-fsl-oss03 hpfs-fsl-oss02 hpfs-fsl-OST0003 zfs:oss03-0/ost-fsl > hpfs-fsl-oss04 hpfs-fsl-oss05 hpfs-fsl-OST0004 zfs:oss04-0/ost-fsl > hpfs-fsl-oss05 hpfs-fsl-oss04 hpfs-fsl-OST0005 zfs:oss05-0/ost-fsl > hpfs-fsl-oss06 hpfs-fsl-oss07 hpfs-fsl-OST0006 zfs:oss06-0/ost-fsl > hpfs-fsl-oss07 hpfs-fsl-oss06 hpfs-fsl-OST0007 zfs:oss07-0/ost-fsl > hpfs-fsl-oss08 hpfs-fsl-oss09 hpfs-fsl-OST0008 zfs:oss08-0/ost-fsl > hpfs-fsl-oss09 hpfs-fsl-oss08 hpfs-fsl-OST0009 zfs:oss09-0/ost-fsl > hpfs-fsl-oss10 hpfs-fsl-oss11 hpfs-fsl-OST000a zfs:oss10-0/ost-fsl > hpfs-fsl-oss11 hpfs-fsl-oss10 hpfs-fsl-OST000b zfs:oss11-0/ost-fsl > > > > > /etc/modprobe.d/lustre.conf: > > options lnet networks=tcp0(enp4s0),o2ib0(ib1) > options ko2iblnd map_on_demand=32 > > -----Original Message----- > From: Brian Andrus <toomuc...@gmail.com> > Date: Friday, February 10, 2017 at 12:07 AM > To: Darby Vicker <darby.vicke...@nasa.gov>, Ben Evans <bev...@cray.com>, " > lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.org> > Subject: Re: [lustre-discuss] design to enable kernel updates > > Darby, > > Do you mind if I inquire about the setup for your lustre systems? > I'm trying to understand how the MGS/MGT is setup for high availability. > I understand with OSTs and MDTs where all I really need is to have the > failnode set when I do the mkfs.lustre > However, as I understand it, you have to use something like pacemaker > and drbd to deal with the MGS/MGT. Is this how you approached it? > > Brian Andrus > > > > On 2/6/2017 12:58 PM, Vicker, Darby (JSC-EG311) wrote: > > Agreed. We are just about to go into production on our next LFS with the > > setup described. We had to get past a bug in the MGS failover for > > dual-homed servers but as of last week that is done and everything is > > working great (see "MGS failover problem" thread on this mailing list > from > > this month and last). We are in the process of syncing our existing LFS > > to this new one and I've failed over/rebooted/upgraded the new LFS > servers > > many times now to make sure we can do this in practice when the new LFS > goes > > into production. Its working beautifully. > > > > Many thanks to the lustre developers for their continued efforts. We > have > > been using and have been fans of lustre for quite some time now and it > > just keeps getting better. > > > > -----Original Message----- > > From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on > behalf of Ben Evans <bev...@cray.com> > > Date: Monday, February 6, 2017 at 2:22 PM > > To: Brian Andrus <toomuc...@gmail.com>, "lustre-discuss@lists.lustre.org" > <lustre-discuss@lists.lustre.org> > > Subject: Re: [lustre-discuss] design to enable kernel updates > > > > It's certainly possible. When I've done that sort of thing, you upgrade > > the OS on all the servers first, boot half of them (the A side) to the > new > > image, all the targets will fail over to the B servers. Once the A side > > is up, reboot the B half to the new OS. Finally, do a failback to the > > "normal" running state. > > > > At least when I've done it, you'll want to do the failovers manually so > > the HA infrastructure doesn't surprise you for any reason. > > > > -Ben > > > > On 2/6/17, 2:54 PM, "lustre-discuss on behalf of Brian Andrus" > > <lustre-discuss-boun...@lists.lustre.org on behalf of > toomuc...@gmail.com> > > wrote: > > > >> All, > >> > >> I have been contemplating how lustre could be configured such that I > >> could update the kernel on each server without downtime. > >> > >> It seems this is _almost_ possible when you have a san system so you > >> have failover for OSTs and MDTs. BUT the MGS/MGT seems to be the > >> problematic one, since rebooting that seems cause downtime that cannot > >> be avoided. > >> > >> If you have a system where the disks are physically part of the OSS > >> hardware, you are out of luck. The hypothetical scenario I am using is > >> if someone had a VM that was a qcow image on a lustre mount (basically > >> an active, open file being read/written to continuously). How could > >> lustre be built to ensure anyone on the VM would not notice a kernel > >> upgrade to the underlying lustre servers. > >> > >> > >> Could such a setup be done? It seems that would be a better use case for > >> something like GPFS or Gluster, but being a die-hard lustre enthusiast, > >> I want to at least show it could be done. > >> > >> > >> Thanks in advance, > >> > >> Brian Andrus > >> > >> _______________________________________________ > >> lustre-discuss mailing list > >> lustre-discuss@lists.lustre.org > >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > _______________________________________________ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > > > > > > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > -- ------------------------------ Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x1001 f: 858-412-3845 m: 619-204-9061 4170 Morena Boulevard, Suite D - San Diego, CA 92117 High-Performance Computing / Lustre Filesystems / Scale-out Storage
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org