Amanda experts:
I’m trying to follow up my woes wherein a TCP-flavored client (auth KRB5 in
this case) being offline
cause other backups to fail. I just noticed that all the failing nodes are
(a) UDP type nodes, ie auth=bsd
(b) failing in the estimate phase
Why would one node affect the others?
PS Why, when node ACSY (names changed to protect the innocent) failed to
initially connect,
does amanda still try to re-connect and do estimates? Wouldn’t an initial
failure (during the KRB5
privilege negotiation) cancel the whole process for that node?
Perhaps I should set hostp->up = HOST_DONE after the first connection
failure? Then
it wouldn’t affect other nodes at a later stage in the process. Right?
That’s be an easy code
insertion. (i think)
Deb Baddorf
Fermilab
=========== (sorry, debug logs are gone by now. I could re-create them though
by doing it again) ================
These dumps were to tape ad-LTO2-daily-117.
The next 4 tapes Amanda expects to use are: ad-LTO2-daily-118,
ad-LTO2-daily-119, ad-LTO2-daily-120, ad-LTO2-daily-121.
FAILURE DUMP SUMMARY:
planner: ERROR Request to ACSY failed: Connection timed out
<<<<<<<<< THIS NODE IS DOWN
CHAB WWWdata lev 0 FAILED [too many dumper retry: [request failed: timeout
waiting for REP]]
ACSY / lev 0 FAILED [Request to ACSY failed: Connection timed out]
<<<<<<<< YET IT STILL TRIES IT, AND BOTHERS OTHERS
ACSY /boot lev 0 FAILED [Request to ACSY failed: Connection timed out]
<<<<<<<< IN THIS ESTIMATE PHASE
ACSY /data lev 0 FAILED [Request to ACSY failed: Connection timed out]
<<<<<<<<
ADES / lev 0 FAILED [Some estimate timeout on ADES]
ADES /home lev 0 FAILED [Some estimate timeout on ADES]
ADES /opt lev 0 FAILED [Some estimate timeout on ADES]
ADES /usr lev 0 FAILED [Some estimate timeout on ADES]
ADES /var lev 0 FAILED [Some estimate timeout on ADES]
ADES /boot lev 0 FAILED [Some estimate timeout on ADES]
LINA / lev 0 FAILED [Some estimate timeout on LINA]
LINA /var lev 0 FAILED [Some estimate timeout on LINA]
LINA /usr lev 0 FAILED [Some estimate timeout on LINA]
LINA /data lev 0 FAILED [Some estimate timeout on LINA]
ANIM / lev 0 FAILED [Some estimate timeout on ANIM]
ANIM /var lev 0 FAILED [Some estimate timeout on ANIM]
ANIM /usr/local/www lev 0 FAILED [Some estimate timeout on ANIM]
ANIM /usr/home lev 0 FAILED [Some estimate timeout on ANIM]
ANIM /usr/local/apache-tomcat-7.0 lev 0 FAILED [Some estimate timeout on ANIM]
BINA / lev 0 FAILED [Some estimate timeout on BINA]
BINA /var lev 0 FAILED [Some estimate timeout on BINA]
GRAV / lev 0 FAILED [Some estimate timeout on GRAV]
GRAV /var lev 0 FAILED [Some estimate timeout on GRAV]
GUMB / lev 0 FAILED [Some estimate timeout on GUMB]
GUMB /esh lev 0 FAILED [Some estimate timeout on GUMB]
GUMB /home lev 0 FAILED [Some estimate timeout on GUMB]
GUMB /opt lev 0 FAILED [Some estimate timeout on GUMB]
GUMB /usr lev 0 FAILED [Some estimate timeout on GUMB]
GUMB /var lev 0 FAILED [Some estimate timeout on GUMB]
PROT / lev 0 FAILED [Some estimate timeout on PROT]
PROT /var lev 0 FAILED [Some estimate timeout on PROT]
QUAS / lev 0 FAILED [Some estimate timeout on QUAS]
QUAS /var lev 0 FAILED [Some estimate timeout on QUAS]
CHAB WWWdata lev 0 FAILED [cannot read header: got 0 bytes instead of 32768]
CHAB WWWdata lev 0 FAILED [cannot read header: got 0 bytes instead of 32768]