On Tue, 2009-04-07 at 09:20 +0200, Arne Wiebalck wrote:
> Brian,

Arne,

> what about if you have multiple clients, all having transactions with
> the OSS open. Now the OSS goes down and comes back. From what I
> understand, the server goes into recovery and rejects new connections 
> before recovery is finished (correct?).

Correct.

> What if all but one client
> reconnect, i.e. you lose one client: are the transactions of the
> successfully reconnected clients replayed or are they discarded?

If the lost client has a transaction that needs to be replayed, all of
the transactions up to that missing transaction are replayed but all
subsequent transactions are discarded and when the recovery timer
expires, recovery is aborted.

The semantics of this will change when VBR becomes available, in
1.8.something, where something might be 0 even.  In that case, only
transactions actually dependent on the missing transactions will be
discarded.

> Independent from the load? I think the 'official' statement was that the
> cluster has to be quiescent, i.e. no client activity. Is that (still)
> true?

Yes, that is the official statement and I don't think any further
testing has been done to change that statement, officially, but I think
the general feeling is that quiescence should not be necessary, but we
just don't have the scientific testing to be assured of that.

So if you want to be safe, quiesce the filesystem first.  :-)

b.

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to