Bernd Schubert wrote: > Hello Cory, > > On 09/17/2010 11:31 PM, Cory Spitz wrote: > >> Hi, Bernd. >> >> On 09/17/2010 02:48 PM, Bernd Schubert wrote: >> >>> On Friday, September 17, 2010, Andreas Dilger wrote: >>> >>>> On 2010-09-17, at 12:42, Jonathan B. Horen wrote: >>>> >>>>> We're trying to architect a Lustre setup for our group, and want to >>>>> leverage our available resources. In doing so, we've come to consider >>>>> multi-purposing several hosts, so that they'll function simultaneously >>>>> as MDS & OSS. >>>>> >>>> You can't do this and expect recovery to work in a robust manner. The >>>> reason is that the MDS is a client of the OSS, and if they are both on the >>>> same node that crashes, the OSS will wait for the MDS "client" to >>>> reconnect and will time out recovery of the real clients. >>>> >>> Well, that is some kind of design problem. Even on separate nodes it can >>> easily happen, that both MDS and OSS fail, for example power outage of the >>> storage rack. In my experience situations like that happen frequently... >>> >>> >> I think that just argues that the MDS should be on a separate UPS. >>
Or dual-redundant UPS devices driving all "critical infrastructure". Redundant power supplies are the norm for server-class hardware, and they should be cabled to different circuits (which each need to be sized to sustain the maximum power). > well, there is not only a single reason. Next hardware issue is that > maybe an IB switch fails. Sure, but that's also easy to address (in theory): put OSS nodes on different leaf switches than MDS nodes, and put the failover pairs on different switches as well. In practice, IB switches probably do not fail often enough to worry about recovery glitches, especially if they have redundant power, but I certainly recommend failover partners are on different switch chips so that in case of a failure it is still possible to get the system up. I would also recommend using bonded network interfaces to avoid cable-failure issues (ie, connect both OSS nodes to both of the leaf switches, rather than one to each), but there are some outstanding issues with Lustre on IB bonding (patches in bugzilla), and of course multipath to disk (loss of connectivity to disk was mentioned at LUG as one of the biggest causes of Lustre issues). In general it is easier to have redundant cables than to ensure your HA package properly monitors cable status and does a failover when required. > And then have also seen cascading Lustre > failures. It starts with an LBUG on the OSS, which triggers another > problem on the MDS... > Yes, that's why bugs are fixed. panic_on_lbug may help stop the problem before it spreads, depending on the issue. > Also, for us this actually will become a real problem, which cannot be > easily solved. So this issue will become a DDN priority. > > > Cheers, > Bernd > > -- > Bernd Schubert > DataDirect Networks > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
