* Jean-Louis Martineau <[EMAIL PROTECTED]> [20070611 10:00]: > amandad have a hard limit of 6h (see REP_TIMEOUT in amandad-src/amandad.c) > in waiting for the reply from sendsize. > > Try the attached patch, it reset the timeout after each estimates.
Thanks Jean-Louis. Would that explains why I see a lot of runaway processes after sendsize times out? Over the weekend I had a situation where over +90 gnutar processes were left around with init as parent like the following: UID PID PPID C STIME TTY TIME CMD root 23243074 1 0 16:22:41 ? 11:40 gtar --create --file - --directory /data/mafalda/mafalda1/susanita/jen/anxiety_ The relevent debug file showed: runtar.20070610162241.debug runtar: debug 1 pid 23243074 ruid 666 euid 0: start at Sun Jun 10 16:22:41 2007 runtar: time 0.002: version 2.5.2-20070523 /usr/freeware/bin/tar version: tar (GNU tar) 1.13.25 config: stk_80-conf1 runtar: debug 1 pid 23243074 ruid 0 euid 0: rename at Sun Jun 10 16:22:41 2007 running: /usr/freeware/bin/tar: 'gtar' '--create' '--file' '-' '--directory' '/data/mafalda/mafalda1/susanita/jen/anxiety_version1/sub115' '--one-file-system' '--listed-incremental' '/opt/amanda/amanda1/var/amanda/gnutar-lists/yoricksub115_1.new' '--sparse' '--ignore-failed-read' '--totals' '.' runtar: time 0.020: pid 23243074 finish time Sun Jun 10 16:22:41 2007 I've this with both xfsdump and gnutar. thanks, jf > > Jean-Louis > > Jean-Francois Malouin wrote: > >Hi, > > > >A new problem that has me stumped: all the amdumps from client to server > >(same host runing 2.5.2-20070623) have failed due to estimate timing > >out after 6:00h. This happened in all the multiple config that I run, > >even though the etimeout in each of the amanda config is set to > >ridiculous value: in one case etimeout=5600 and I have 77 DLEs which > >should not timeout for ~120h! Anything else could cause this: > > > >FAILURE AND STRANGE DUMP SUMMARY: > > yorick /data/bigml/bigml1 lev 0 FAILED [disk > >/data/bigml/bigml1, all estimate timed out] > >... > > yorick /data/nih/nih1/ lev 0 FAILED [disk > >/data/nih/nih1/, all estimate timed out] > > planner: ERROR Request to yorick failed: EOF on read from yorick > > > > > >STATISTICS: > > Total Full Incr. > > -------- -------- -------- > >Estimate Time (hrs:min) 6:00 > >Run Time (hrs:min) 15:07 > >Dump Time (hrs:min) 15:14 14:59 0:15 > > > > > >jf > > > -- <° ><
