( I’ve joined the amanda-hackers list too. Would this be better there? )
Since amanda 3.3.3 (when I started using bsdtcp and krb5 auth types, in addition to bsd), I’m getting a 3 minute timeout on connection to a node that is down. I’ve deduced that I can lower “connect-tries” from the default of 3 down to 2. This reduces my freeze time from 9 minutes to 6. I’m not sure lowering “connect-tries” to 1 is safe. Where is the 3 minute timeout coming from? It doesn’t seem to be etimeout (which I’ve got set at 29 minutes) nor dtimeout (set to 30 minutes) nor ctimeout (set to 30 seconds, and I *did* try changing this one). Is this a non-amanda system level delay? My expert & I can’t think of one. Is there another timeout parameter in amanda that I can play with? I’ve got 39 nodes to backup, so I don’t always know when one is down for some reason. And with 39 clients, somehow the 9 minute freeze time (before I lowered the TRIES to 2) manages to make OTHER clients fail too, even though they are running fine. This is actually the part that bothers me! PS this 3 minute per try timeout was timed using amcheck but happens during amdump too. Both amcheck and amdump have other nodes failing if one node is down. PPS the connect failure is between these two lines (with debug-auth and debug-protocol both set to 5): KRB5 node down: Tue Oct 14 16:44:06 2014: thd-0x8fc6340: amcheck-clients: connect_port: Try port 50000: available - Success Tue Oct 14 16:47:15 2014: thd-0x8fc6340: amcheck-clients: connect_portrange: Connect from 0.0.0.0:50000 failed: Connection timed out BSDTCP node down: Tue Oct 14 16:35:20 2014: thd-0x86e8340: amcheck-clients: connect_port: Try port 517: available - Success Tue Oct 14 16:38:29 2014: thd-0x86e8340: amcheck-clients: connect_portrange: Connect from 0.0.0.0:517 failed: Connection timed out Deb Baddorf Fermilab
