Robert, We have several machines that are on multiple networks.
for instance, amanda server "curie" received a second functional interface at one point, this is the DNS information but its really just one multi-homed box. Name: curie.wadsworth.org Address: 10.50.156.66 Name: curieb.wadsworth.org Address: 199.184.30.20 This of course was performed by the guy managing curie and caused 1/2 my amanda clients to fail, so .amandahosts was modified to reflect connection from either curie or curieb, depending on which network it sat on. At some point, months later, I started to notice that some amanda clients failed in a rather random manor, some some nights but not others and I realized that my network was not being routed consistently, since we created multiple paths between the single homed clients and the mutli-homed server. I worked with my network manager and was not able to really resolve the issue and resorted to adding BOTH "curie" and "curieb" to the .amandahosts files on the clients. This effectively resolved the issue for us. We have not seen any additional issue nor have we had time to pursue the issue further. I realize this is a work-around and not an answer or solution, but its the best I can currently offer. good luck, Brian On Wed, Aug 31, 2011 at 10:47:15AM -0400, McGraw, Robert P wrote: > The last couple days we have been getting the following errors (see below). > After some searching on the errors someone mentioned about doing am amcheck > on these hosts. Our Backups were working fine until a few days ago but I > decided to run amcheck on the host that had the errors just to be sure. Sure > enough the amcheck failed on these hosts. > > One strange sequence of events was on machine hardy I ran amcheck and it > failed. A few minutes later when I ran it again it passed, and a few minutes > later it failed again. Hummmm > > These hosts all run the same version of RedHat, and they are multi-homed; > same host name but they have different subnet/interfaces. I started wondering > if the multi-home could be causing the problem. > > So I modified my disk list to use the IP address of the interfaces such as > the following: > > hardy / remote-dump-bsd -1 enet100 > 100.210.30.54 / remote-dump-bsd -1 enet100 > 100.210.40.22 / remote-dump-bsd -1 enet100 > > I ran amcheck on each of the hostnames. > > [99][amandabacku@hertz]:~/daily% amcheck -c daily hardy > Amanda Backup Client Hosts Check > -------------------------------- > WARNING: hardy: selfcheck request failed: timeout waiting for ACK > Client check: 1 host checked in 30.006 seconds. 1 problem found. > (brought to you by Amanda 3.2.3) > > > [100][amandabacku@hertz]:~/daily% amcheck -c daily 100.210.30.54 > Amanda Backup Client Hosts Check > -------------------------------- > Client check: 1 host checked in 0.104 seconds. 0 problems found. > (brought to you by Amanda 3.2.3) > > > [101][amandabacku@hertz]:~/daily% amcheck -c daily 100.210.40.22 > Amanda Backup Client Hosts Check > -------------------------------- > WARNING: 100.210.40.22: selfcheck request failed: timeout waiting for ACK > Client check: 1 host checked in 30.006 seconds. 1 problem found. > (brought to you by Amanda 3.2.3) > > The amanda client on hardy is from the RedHat distrubtion. Just use what was > in the box. > > Not sure why all the sudden I am getting the amcheck error on these machines. > Network wise nothing has been changed. > > This only shows that amcheck works, it does not show that the backup will > work. > > Some options are to use the IP address in place of the name. Another is to > make a CNAME for the subnets. > > Any comments or suggestions as to what might be going on or am I completely > off base. > > Thanks > > Robert > > > ---------------------ERRORS----------------------------------------- > planner: ERROR Request to bohr failed: timeout waiting for ACK > planner: ERROR Request to hardy failed: timeout waiting for ACK > planner: ERROR Request to leibniz failed: timeout waiting for ACK > > banach / lev 0 FAILED [too many dumper retry: [request failed: timeout > waiting for ACK]] > banach /boot lev 0 FAILED [too many dumper retry: [request failed: timeout > waiting for ACK]] > pythagoras / lev 0 FAILED [too many dumper retry: [request failed: timeout > waiting for ACK]] > pythagoras /boot lev 0 FAILED [too many dumper retry: [request failed: > timeout waiting for ACK]] > banach / lev 0 FAILED [cannot read header: got 0 bytes instead of 32768] > banach / lev 0 FAILED [cannot read header: got 0 bytes instead of 32768] > banach /boot lev 0 FAILED [cannot read header: got 0 bytes instead of > 32768] > banach /boot lev 0 FAILED [cannot read header: got 0 bytes instead of > 32768] > pythagoras / lev 0 FAILED [cannot read header: got 0 bytes instead of > 32768] > pythagoras / lev 0 FAILED [cannot read header: got 0 bytes instead of > 32768] > pythagoras /boot lev 0 FAILED [cannot read header: got 0 bytes instead of > 32768] > pythagoras /boot lev 0 FAILED [cannot read header: got 0 bytes instead of > 32768] > > > > _____________________________________________________________________ > Robert P. McGraw, Jr. > Manager, Computer System EMAIL: rmcg...@purdue.edu > Purdue University ROOM: MATH-807 > Department of Mathematics PHONE: (765) 494-6055 > 150 N. University Street > West Lafayette, IN 47907-2067 > > > --- Brian R Cuttler brian.cutt...@wadsworth.org Computer Systems Support (v) 518 486-1697 Wadsworth Center (f) 518 473-6384 NYS Department of Health Help Desk 518 473-0773 IMPORTANT NOTICE: This e-mail and any attachments may contain confidential or sensitive information which is, or may be, legally privileged or otherwise protected by law from further disclosure. It is intended only for the addressee. If you received this in error or from someone who was not authorized to send it to you, please do not distribute, copy or use it or any attachments. Please notify the sender immediately by reply e-mail and delete this from your system. Thank you for your cooperation.