Hi, Bernd. On 09/17/2010 02:48 PM, Bernd Schubert wrote: > On Friday, September 17, 2010, Andreas Dilger wrote: >> On 2010-09-17, at 12:42, Jonathan B. Horen wrote: >>> We're trying to architect a Lustre setup for our group, and want to >>> leverage our available resources. In doing so, we've come to consider >>> multi-purposing several hosts, so that they'll function simultaneously >>> as MDS & OSS. >> >> You can't do this and expect recovery to work in a robust manner. The >> reason is that the MDS is a client of the OSS, and if they are both on the >> same node that crashes, the OSS will wait for the MDS "client" to >> reconnect and will time out recovery of the real clients. > > Well, that is some kind of design problem. Even on separate nodes it can > easily happen, that both MDS and OSS fail, for example power outage of the > storage rack. In my experience situations like that happen frequently... >
I think that just argues that the MDS should be on a separate UPS. > I think some kind a pre-connection would be required, where a client can tell > a server, that it was rebooted and that the server shall not to wait any > longer for it. Actually, shouldn't be that difficult, as already different > connection flags exist. So if the client contacts a server and ask for an > initial connection, the server could check for that NID and then immediately > abort recovery for that client. > > > Cheers, > Bernd > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
