* Garnet Harris <[email protected]> [20091030 13:33]: > I am experimenting with amanda and am having problems understanding how > the "estimate timeout" works. What I am seeing is not what I expect > based on the explaination in the amanda.conf page. > > Client and server are on the same machine. > > 36 - DLEs (some set to calcsize, most use tar for estimates) > > all DLEs are set to "always full" (to remove any confusion caused by > incremental backups) > > etimeout = 900 > > My understanding is that amanda will allow 900 seconds per DLE. So it > should timeout in 32,400 seconds (900 x 36) or 9 hours. According to > the report after amdump runs, the estimate phase is over 11 hours. > > Looking at the sendsize log, the last "estimate time for" the 28th DLE > is at time 30,708. And, the time stamp on the sendsize file matches: > approximately 8.5 hours after amanda started. (So far so good.) > However, the first runtar log doesn't appear until another 1.5 hours > later. Which means amanda didn't do anything for 1.5 hours. (The lag > is greater when allowing incremental backups.) > > Looking at the planner log on the server side, there is a "dgram_recv" > with a matching for each "estimate time" entry on the client side for > the first 27 DLEs. > > client sendsize log: > > sendsize[8661]: time 9899.506: estimate time for home_q level 0: 12.033 > sendsize[8668]: time 13962.655: estimate time for home_r level 0: 4063.141 > sendsize[9335]: time 30708.356: estimate time for home_s level 0: 16745.578 > > > server planner log: > > time 9899.665: dgram_recv(dgram=0xb805c764, timeout=0, fromaddr=0xb806c750) > time 9899.665: (sockaddr_in6 *)0xb806c750 = { 10, 10080, > ::ffff:192.168.0.247 } > time 13962.833: dgram_recv(dgram=0xb805c764, timeout=0, fromaddr=0xb806c750) > time 13962.833: (sockaddr_in6 *)0xb806c750 = { 10, 10080, > ::ffff:192.168.0.247 } > time 21600.190: dgram_recv(dgram=0xb805c764, timeout=0, fromaddr=0xb806c750) > time 21600.211: (sockaddr_in6 *)0xb806c750 = { 10, 10080, > ::ffff:192.168.0.247 } > time 40082.633: security_seterror(handle=0x80721f8, driver=0xb804a720 > (BSD) error=timeout waiting for REP) > time 40082.665: security_close(handle=0x80721f8, driver=0xb804a720 (BSD)) > time 40082.719: pid 6922 finish time Sun Oct 25 19:08:03 2009 > > > Something is happening at 21600 (6 hours). The server recevies a dmesg > from somewhere (there is no corresponding entry in the sendsize log) and > stops looking for estimates from the client. Then waits another 5 hours > before it starts the actual backup. > > Any idea what is happening at 21600?
been there seen that. Look at REP_TIMEOUT = (6*60*60) set in amandad-src/amandad.c, ie 6hrs. This is what you are seeing. You either have to figure out why your clients are so slow or recompile the client with a bigger REP_TIMEOUT. I'm sure Dustin or Jean-Louis will chime in if I'm mistaken. hth, jf > > -- > Garnet Harris TEL: +301 515 7118 > Eka Systems FAX: +301 515 4965 > 20201 Century Blvd., Suite 250 > Germantown, MD 20874 [email protected] -- <° >< Jean-François Malouin McConnell Brain Imaging Centre Systems/Network Administrator Montréal Neurological Institute 3801 Rue Université, Suite WB219 Montréal, Québec, H3A 2B4 Phone: 514-398-8924 Fax: 514-398-8948
