On 10/17/2014 05:24 PM, Debra S Baddorf wrote:
Amanda experts:
I’m trying to follow up my woes wherein a TCP-flavored client  (auth KRB5 in 
this case) being offline
cause other backups to fail.  I just noticed that all the failing nodes are
(a)  UDP type nodes,  ie   auth=bsd
(b)  failing in the estimate phase

Why would one node affect the others?
Because the connect system hang for a long time, and all others client exceed their timeout.

PS  Why, when  node ACSY  (names changed to protect the innocent)  failed to 
initially connect,
does amanda still try to re-connect and do estimates?    Wouldn’t an initial 
failure  (during the KRB5
privilege negotiation)   cancel the whole process for that node?

     Perhaps I should set    hostp->up = HOST_DONE   after the first connection 
failure?   Then
it wouldn’t affect  other nodes at a later stage in the process.  Right?  
That’s be an easy code
insertion.   (i think)
You can try it, only do it for 'Connection timed out' error.

Jean-Louis


Deb Baddorf
Fermilab

===========  (sorry, debug logs are gone by now.  I could re-create them though 
by doing it again) ================

These dumps were to tape ad-LTO2-daily-117.
The next 4 tapes Amanda expects to use are: ad-LTO2-daily-118, 
ad-LTO2-daily-119, ad-LTO2-daily-120, ad-LTO2-daily-121.
FAILURE DUMP SUMMARY:
planner: ERROR Request to ACSY failed: Connection timed out                 
<<<<<<<<<  THIS NODE IS DOWN
CHAB WWWdata lev 0  FAILED [too many dumper retry: [request failed: timeout 
waiting for REP]]
ACSY / lev 0  FAILED [Request to ACSY failed: Connection timed out]              
<<<<<<<< YET IT STILL TRIES IT, AND BOTHERS OTHERS
ACSY /boot lev 0  FAILED [Request to ACSY failed: Connection timed out]              
<<<<<<<< IN THIS ESTIMATE PHASE
ACSY /data lev 0  FAILED [Request to ACSY failed: Connection timed out]              
<<<<<<<<
ADES / lev 0  FAILED [Some estimate timeout on ADES]
ADES /home lev 0  FAILED [Some estimate timeout on ADES]
ADES /opt lev 0  FAILED [Some estimate timeout on ADES]
ADES /usr lev 0  FAILED [Some estimate timeout on ADES]
ADES /var lev 0  FAILED [Some estimate timeout on ADES]
ADES /boot lev 0  FAILED [Some estimate timeout on ADES]
LINA / lev 0  FAILED [Some estimate timeout on LINA]
LINA /var lev 0  FAILED [Some estimate timeout on LINA]
LINA /usr lev 0  FAILED [Some estimate timeout on LINA]
LINA /data lev 0  FAILED [Some estimate timeout on LINA]
ANIM / lev 0  FAILED [Some estimate timeout on ANIM]
ANIM /var lev 0  FAILED [Some estimate timeout on ANIM]
ANIM /usr/local/www lev 0  FAILED [Some estimate timeout on ANIM]
ANIM /usr/home lev 0  FAILED [Some estimate timeout on ANIM]
ANIM /usr/local/apache-tomcat-7.0 lev 0  FAILED [Some estimate timeout on ANIM]
BINA / lev 0  FAILED [Some estimate timeout on BINA]
BINA /var lev 0  FAILED [Some estimate timeout on BINA]
GRAV / lev 0  FAILED [Some estimate timeout on GRAV]
GRAV /var lev 0  FAILED [Some estimate timeout on GRAV]
GUMB / lev 0  FAILED [Some estimate timeout on GUMB]
GUMB /esh lev 0  FAILED [Some estimate timeout on GUMB]
GUMB /home lev 0  FAILED [Some estimate timeout on GUMB]
GUMB /opt lev 0  FAILED [Some estimate timeout on GUMB]
GUMB /usr lev 0  FAILED [Some estimate timeout on GUMB]
GUMB /var lev 0  FAILED [Some estimate timeout on GUMB]
PROT / lev 0  FAILED [Some estimate timeout on PROT]
PROT /var lev 0  FAILED [Some estimate timeout on PROT]
QUAS / lev 0  FAILED [Some estimate timeout on QUAS]
QUAS /var lev 0  FAILED [Some estimate timeout on QUAS]
CHAB WWWdata lev 0  FAILED [cannot read header: got 0 bytes instead of 32768]
CHAB WWWdata lev 0  FAILED [cannot read header: got 0 bytes instead of 32768]

Reply via email to