On 10/14/2014 06:14 PM, Debra S Baddorf wrote:
( I’ve joined the amanda-hackers list too.  Would this be better there? )

Since amanda 3.3.3  (when I started using bsdtcp  and  krb5 auth types,  in 
addition to bsd),
I’m getting a 3 minute timeout on connection to a node that is down.  I’ve 
deduced that I
can lower “connect-tries”  from the default of 3 down to 2.  This reduces my 
freeze time from
9 minutes to 6.      I’m not sure lowering “connect-tries”  to 1 is safe.

Where is the 3 minute timeout coming from?
It come from the connect system call.

It doesn’t seem to be etimeout  (which I’ve got set at 29 minutes)
nor   dtimeout   (set to 30 minutes)
nor ctimeout   (set to 30 seconds,  and I *did* try changing this one).

Is this a non-amanda  system level delay?  My expert & I can’t think of one.

Is there another timeout parameter in amanda  that I can play with?

I’ve got 39 nodes to backup,  so I don’t always know when one is down for some 
reason.
And with 39 clients,  somehow the 9 minute freeze time  (before I lowered the 
TRIES  to 2)
manages to make OTHER clients fail too, even though they are running fine.  
This is actually
the part that bothers me!

PS  this 3 minute per try timeout  was timed using  amcheck  but happens during 
amdump too.
Both  amcheck and amdump    have other nodes failing  if one node is down.

PPS  the connect failure is between these two lines  (with debug-auth  and 
debug-protocol both set to 5):
KRB5 node down:
Tue Oct 14 16:44:06 2014: thd-0x8fc6340: amcheck-clients: connect_port: Try  
port 50000: available - Success
Tue Oct 14 16:47:15 2014: thd-0x8fc6340: amcheck-clients: connect_portrange: 
Connect from 0.0.0.0:50000 failed: Connection timed out
BSDTCP node down:
Tue Oct 14 16:35:20 2014: thd-0x86e8340: amcheck-clients: connect_port: Try  
port 517: available - Success
Tue Oct 14 16:38:29 2014: thd-0x86e8340: amcheck-clients: connect_portrange: 
Connect from 0.0.0.0:517 failed: Connection timed out

Deb Baddorf
Fermilab

Reply via email to