On Jan 21, 2015, at 5:25 PM, Nathan Stratton Treadway <[email protected]> 
wrote:

> On Wed, Jan 21, 2015 at 16:39:33 -0500, Chris Hoogendyk wrote:
>> If some of the DLEs were fully estimated in short time, would those
>> also fail just because other DLEs on the same host caused long time
>> delays?
> 
> Yes -- to over-simplify a bit, Amanda waits for all the estimates from a
> particular machine to complete before proceeding with any dumps from
> that machine…


(Was waiting for this to be confirmed before I chimed in.  I thought this
was the case.)

> 
>> I just find it odd that things were working smoothly up to the 16th
>> and then consistently and completely failing after the 16th.
> 
> Can you go back and see how long the estimate was taking before the
> 16th?
> 
> If it was nowhere near 6 hours, then probably something suddenly made it
> stop working (e.g. a hung NFS mount, as Joi mentioned).
> 
> If it was just a few minutes under 6 hours, then maybe the file count
> just grew enough that it tipped over the estimate timeout, in which case
> bumping the timeout in the config might be enough to get things working
> again with the minium of changes.  (However, 6 hours seems like a long
> time to be waiting for the estimate phase, so switching to a different
> estimate method might make sense in terms of speeding your overall
> run....)
> 
>                                                       Nathan

And if you can’t find out how long it took before,  then increasing it by
just a little over its current value (add maybe 5 minutes?) ought to tell you 
whether this scenario
has happened.    On the basis of a 13 years ago estimate  (it’s probably much 
shorter
now, as my clients are faster, and my network pipelines are faster, and none of
my clients are quite as far away)    my  ETIMEOUT  is set at 2000  seconds
per filesystem  — which I believe is per DLE  and is multiplied by the number
of DLEs on each client.    (As I read the notes).   2000 seconds is 33 minutes,
and then multiply it by the number of DLEs.

Still, my backups all complete in 6-8 hours these days,  so nobody is taking
6 hours just for his estimate.    I have one client node that does, some times,
and then I ask for him (it?)  to be rebooted cuz he’s sick.

Deb Baddorf
Fermilab

> 
> ----------------------------------------------------------------------------
> Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
> Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
> GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
> Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


Reply via email to