Hey folks. I've been trying to solve a problem with amanda for the past few months. Until yesterday it was only a problem on one (out of ~10 servers) client. Now it's two!
Examples from the dump report: newww.gta. / lev 0 FAILED [Request to newww.gta.com timed out.] bento.gta. / lev 0 FAILED [Request to bento.gta.com timed out.] bento has had the problem longer. amcheck reports no errors with bento. today, amcheck DOES report errors with newww; WARNING: newww.gta.com: selfcheck request timed out. Host down? Even though selfcheck...debug seems fine: /tmp/amanda%# more selfcheck.20030102105821.debug selfcheck: debug 1 pid 61064 ruid 2 euid 2 start time Thu Jan 2 10:58:21 2003 /usr/local/libexec/amanda/selfcheck: version 2.4.3b2 selfcheck: checking disk /var selfcheck: device /var selfcheck: OK selfcheck: checking disk /usr selfcheck: device /usr selfcheck: OK selfcheck: checking disk /home selfcheck: device /home selfcheck: OK selfcheck: checking disk / selfcheck: device / selfcheck: OK selfcheck: pid 61064 finish time Thu Jan 2 10:58:21 2003 (ran it twice to be sure..same result, same report.) The amandad..debug for this ends with: amandad: It's not an ack amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, giving up! amandad: pid 61063 finish time Thu Jan 2 10:59:11 2003 --- The amandad..debug on the client when amdump runs is similar: <clip> amandad: sending REP packet: ---- Amanda 2.4 REP HANDLE 009-80350808 SEQ 1041408009 OPTIONS maxdumps=1; / 0 SIZE 46800 / 1 SIZE 46800 /home 0 SIZE 547240 /home 1 SIZE 547240 /usr 0 SIZE 5118390 /usr 1 SIZE 5119120 /usr 2 SIZE 5119120 /var 0 SIZE 179050 /var 1 SIZE 179050 ---- amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, retrying amandad: dgram_recv: timeout after 10 seconds amandad: waiting for ack: timeout, giving up! amandad: pid 13671 finish time Wed Jan 1 03:01:23 2003 ------- Help! I'm open to any and all suggestions. FWIW, bento and the amanda server are on the same ethernet switch. newww and the amanda server are seperated by a firewall (which is, and has been, correctly configured. Two other servers on the same network as newww still backup correctly.) If you need any specific information from me, let me know and I can provide it. I'm not yet sure what will help you folks help me.:) thanks. ...david --- david raistrick [EMAIL PROTECTED] http://www.expita.com/nomime.html
