Hello Cory, On 09/17/2010 11:31 PM, Cory Spitz wrote: > Hi, Bernd. > > On 09/17/2010 02:48 PM, Bernd Schubert wrote: >> On Friday, September 17, 2010, Andreas Dilger wrote: >>> On 2010-09-17, at 12:42, Jonathan B. Horen wrote: >>>> We're trying to architect a Lustre setup for our group, and want to >>>> leverage our available resources. In doing so, we've come to consider >>>> multi-purposing several hosts, so that they'll function simultaneously >>>> as MDS & OSS. >>> >>> You can't do this and expect recovery to work in a robust manner. The >>> reason is that the MDS is a client of the OSS, and if they are both on the >>> same node that crashes, the OSS will wait for the MDS "client" to >>> reconnect and will time out recovery of the real clients. >> >> Well, that is some kind of design problem. Even on separate nodes it can >> easily happen, that both MDS and OSS fail, for example power outage of the >> storage rack. In my experience situations like that happen frequently... >> > > I think that just argues that the MDS should be on a separate UPS.
well, there is not only a single reason. Next hardware issue is that maybe an IB switch fails. And then have also seen cascading Lustre failures. It starts with an LBUG on the OSS, which triggers another problem on the MDS... Also, for us this actually will become a real problem, which cannot be easily solved. So this issue will become a DDN priority. Cheers, Bernd -- Bernd Schubert DataDirect Networks
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
