Re: [Lustre-discuss] Hastening lustrefs recovery

Brian J. Murrell Thu, 16 Jul 2009 17:15:49 -0700

On Thu, 2009-07-16 at 18:59 -0400, Josephine Palencia wrote:
> 
> What determines the speed at which a lustre fs will recover? (ex. after a 
> crash)


How fast all of the clients can reconnect and replay their pending
transactions.

> Can (should) one hasten the recovery by tweaking some parameters?

There's not much to tweak.  Recovery waits for a) all clients to
reconnect and replay or b) the recovery timer to run out.  The recovery
timer is a factor of obd_timeout.  As you probably know obd_timeout has
a value below which you will start to see timeouts and evictions --
which you don't want of course.  So you don't really want to set it
below that value.

The first question people tend ask when they discover they need to tune
their obd_timeout upwards to avoid lock callback timeouts and so forth
is "why don't I just set that really high then?".  The answer is always
"because the higher you set it, the longer your recovery process will
take in the event that not all clients are available to replay.

Of course the bigger your client count the higher the odds that the
recovery timeout is your deciding factor and not all clients being
available to connect.

Of interest to all of this is that in 1.8, adaptive timeouts (AT) are
enabled by default, so obd_timeout should generally always be high
enough without being too high -- i.e. optimal.  So if your OSSes and MDS
are tuned such that they are not overwhelming their disk backend,
obd_timeout should be reasonable and therefore recovery should be
reasonable.

> For 4 OSTS each with 7TB, ~40 connected clients , recovery time 
> is 48min. Is that reasonable or is that too long?

Wow.  That seems long.  That is recovery of what?  A single OST or
single OSS, or something other?

b.

signature.asc
Description: This is a digitally signed message part

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Hastening lustrefs recovery

Reply via email to