I had a problem with one of my amanda servers not hearing back from one of the clients during estimate phase. While it was my fault (for non-amanda reasons I'd started amdump during daylight hours when the machine was more heavily loaded).
I was able to get a very good idea of how long it too the estimate phase to complete by examining the /tmp/amanda files on the client. I'm not recommending this as a practice, but as long as the info is in the log files... The estimate phase completes when the last child terminates - AFAIK, YMMV [samar] /tmp/amanda 104> grep termin sendsize.20050516210006.debug sendsize[5891]: time 6.609: child 6150 terminated normally sendsize[6146]: time 7.461: asking killpgrp to terminate sendsize[6146]: time 23.051: asking killpgrp to terminate sendsize[5891]: time 24.063: child 6146 terminated normally sendsize[6149]: time 30.378: asking killpgrp to terminate sendsize[6149]: time 68.221: asking killpgrp to terminate sendsize[5891]: time 69.232: child 6149 terminated normally sendsize[5891]: time 143.366: child 6175 terminated normally sendsize[5891]: time 300.214: child 6186 terminated normally sendsize[5891]: time 356.362: child 6207 terminated normally sendsize[5891]: time 1341.432: child 6160 terminated normally sendsize[5891]: time 1796.592: child 6166 terminated normally sendsize[5891]: time 1797.537: child 6364 terminated normally sendsize[5891]: time 1820.820: child 6367 terminated normally sendsize[5891]: time 1820.944: child 6376 terminated normally sendsize[5891]: time 1924.346: child 6216 terminated normally sendsize[5891]: time 2810.064: child 6379 terminated normally sendsize[5891]: time 3041.239: child 6151 terminated normally sendsize[5891]: time 3617.385: child 6320 terminated normally [ On Thu, May 19, 2005 at 11:12:09AM -0400, Jon LaBadie wrote: > On Thu, May 19, 2005 at 10:47:26AM -0400, Guy Dallaire wrote: > > Here is what I have in my amanda log this morning: > > > > FAILURE AND STRANGE DUMP SUMMARY: > > planner: ERROR Estimate timeout from sol > > sol /data2 lev 0 FAILED [disk /data2, all estimate timed out] > > sol /data1 lev 0 FAILED [disk /data1, all estimate timed out] > > sol /disk1 lev 0 FAILED [disk /disk1, all estimate timed out] > > sol / lev 0 FAILED [disk /, all estimate timed out] > > > > What might be wrong here ? The first time I ran amanda, it backed up > > this server without a problem. The timeout parameter is at the > > standard 300 secs. I bumped it up to 600 seconds for the next run but > > I'm worried. > > > > I did not change anything to the config, except maybe yesterday I did > > an "amdamin force sol /" because I decided to start using gnu-tar > > instead of ufsdump on all my root file systems. > > raise them way up, say 6000 sec to just to see if it simply is slow. > > BTW in general it is best to introduce a lot of things a few DLE at > a time. This avoids the problem of massive level 0's all in one dump. > Spread them out like amanda will eventually. Add a couple from sol, > a couple from mercury, one or two from venus, ... Then tomorrow > a few more. > > -- > Jon H. LaBadie [EMAIL PROTECTED] > JG Computing > 4455 Province Line Road (609) 252-0159 > Princeton, NJ 08540-4322 (609) 683-7220 (fax) --- Brian R Cuttler [EMAIL PROTECTED] Computer Systems Support (v) 518 486-1697 Wadsworth Center (f) 518 473-6384 NYS Department of Health Help Desk 518 473-0773
