On Fri, 2019-09-13 at 05:14 -0400, [email protected] wrote: [SNIP]
> Moving a non ha subnet manager from primary to backup and back again > has worked for us without disruption, but I would try to do this in a > maintenance window. > Not on GPFS but in the past I have moved from one subnet manager to another with dozens of running MPI jobs, and Lustre running over the fabric and not missed a beat. My current cluster used 10 and 40Gbps ethernet for GPFS with Omnipath exclusively for MPI traffic. To be honest I just cannot wrap my head around the idea that you would not be running two subnet managers in the first place. Just fire up two subnet managers (whether on a switch or a node) and forget about it. They will automatically work together to give you a HA solution. It is the same with Omnipath too. I would also note that you can fire up more than two fabric managers and it all "just works". If it where me and I didn't have fabric managers running on at least two of my switches and I was doing GPFS over Infiniband, I would fire up fabric managers on all of my NSD servers. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
