On 10/14/2014 06:14 PM, Debra S Baddorf wrote:
( I’ve joined the amanda-hackers list too. Would this be better there? )
Since amanda 3.3.3 (when I started using bsdtcp and krb5 auth types, in
addition to bsd),
I’m getting a 3 minute timeout on connection to a node that is down. I’ve
deduced that I
can lower “connect-tries” from the default of 3 down to 2. This reduces my
freeze time from
9 minutes to 6. I’m not sure lowering “connect-tries” to 1 is safe.
Where is the 3 minute timeout coming from?
It come from the connect system call.
It doesn’t seem to be etimeout (which I’ve got set at 29 minutes)
nor dtimeout (set to 30 minutes)
nor ctimeout (set to 30 seconds, and I *did* try changing this one).
Is this a non-amanda system level delay? My expert & I can’t think of one.
Is there another timeout parameter in amanda that I can play with?
I’ve got 39 nodes to backup, so I don’t always know when one is down for some
reason.
And with 39 clients, somehow the 9 minute freeze time (before I lowered the
TRIES to 2)
manages to make OTHER clients fail too, even though they are running fine.
This is actually
the part that bothers me!
PS this 3 minute per try timeout was timed using amcheck but happens during
amdump too.
Both amcheck and amdump have other nodes failing if one node is down.
PPS the connect failure is between these two lines (with debug-auth and
debug-protocol both set to 5):
KRB5 node down:
Tue Oct 14 16:44:06 2014: thd-0x8fc6340: amcheck-clients: connect_port: Try
port 50000: available - Success
Tue Oct 14 16:47:15 2014: thd-0x8fc6340: amcheck-clients: connect_portrange:
Connect from 0.0.0.0:50000 failed: Connection timed out
BSDTCP node down:
Tue Oct 14 16:35:20 2014: thd-0x86e8340: amcheck-clients: connect_port: Try
port 517: available - Success
Tue Oct 14 16:38:29 2014: thd-0x86e8340: amcheck-clients: connect_portrange:
Connect from 0.0.0.0:517 failed: Connection timed out
Deb Baddorf
Fermilab