amandad have a hard limit of 6h (see REP_TIMEOUT in amandad-src/amandad.c)
in waiting for the reply from sendsize.

Try the attached patch, it reset the timeout after each estimates.

Jean-Louis

Jean-Francois Malouin wrote:
Hi,

A new problem that has me stumped: all the amdumps from client to server
(same host runing 2.5.2-20070623) have failed due to estimate timing
out after 6:00h. This happened in all the multiple config that I run,
even though the etimeout in each of the amanda config is set to
ridiculous value: in one case etimeout=5600 and I have 77 DLEs which
should not timeout for ~120h! Anything else could cause this:

FAILURE AND STRANGE DUMP SUMMARY:
  yorick  /data/bigml/bigml1                  lev 0  FAILED [disk
/data/bigml/bigml1, all estimate timed out]
...
  yorick  /data/nih/nih1/                     lev 0  FAILED [disk
/data/nih/nih1/, all estimate timed out]
 planner: ERROR Request to yorick failed: EOF on read from yorick


STATISTICS:
                          Total       Full      Incr.
                        --------   --------   --------
Estimate Time (hrs:min)    6:00
Run Time (hrs:min)        15:07
Dump Time (hrs:min)       15:14      14:59       0:15


jf

diff -u -r --show-c-function --new-file --exclude-from=/home/martinea/src.orig/amanda.diff --ignore-matching-lines='$Id:' amanda-2.5.2p1/amandad-src/amandad.c amanda-2.5.2p1.amandad/amandad-src/amandad.c
--- amanda-2.5.2p1/amandad-src/amandad.c	2007-05-04 07:39:06.000000000 -0400
+++ amanda-2.5.2p1.amandad/amandad-src/amandad.c	2007-06-11 09:56:32.000000000 -0400
@@ -901,8 +901,13 @@ s_repwait(
 	    do_sendpkt(as->security_handle, &as->rep_pkt);
 	    amfree(as->rep_pkt.body);
 	    pkt_init_empty(&as->rep_pkt, P_REP);
-	}
  
+	    assert(as->ev_reptimeout != NULL);
+	    event_release(as->ev_reptimeout);
+	    as->ev_reptimeout = event_register(REP_TIMEOUT, EV_TIME,
+		timeout_repfd, as);
+	}
+
 	return (A_PENDING);
     }
 

Reply via email to