We have a complex network of private subnets with a NAT and some limited public network presence (again on multiple subnets) for servers that ought to be able to be seen and accessed from the outside.

My Amanda servers are on the private net only. Typically, my Amanda servers are multihomed on only a couple of private subnets, but the servers that are being backed up are multihomed up the wazoo so that they can be reached from anywhere.

What has to work for Amanda is that the fully qualified name of the Amanda server going out and the server it is backing up need to be consistent and predictable. If you can't count on them being consistent, and/or you don't know what that fully qualified name actually is, then you will have trouble with Amanda. If your network design is not clean, and routing is inconsistent, then you could have issues.

Just for example, if I had the Amanda server multihomed to amanda.subnet1.example.com and amanda.subnet2.example.com, and I was backing up a server that had serverx on both of those subnets as well as others, then I ought to be able to decide that Amanda backups are going to be on subnet1 and put the entries in the disklist as subnet1. Then requests should go out on the amanda.subnet1.example.com interface and come back as serverx.subnet1.example.com. If serverx came back at you through subnet2, then you have an unpredictable situation that could fail. Brian's workaround could get you through that, but it's better if you can work out your network so that it is consistent and predictable.


On 8/31/11 11:14 AM, Brian Cuttler wrote:
Robert,

We have several machines that are on multiple networks.

for instance, amanda server "curie" received a second functional
interface at one point, this is the DNS information but its really
just one multi-homed box.

Name:   curie.wadsworth.org
Address: 10.50.156.66

Name:   curieb.wadsworth.org
Address: 199.184.30.20

This of course was performed by the guy managing curie and
caused 1/2 my amanda clients to fail, so .amandahosts was
modified to reflect connection from either curie or curieb,
depending on which network it sat on.

At some point, months later, I started to notice that some amanda
clients failed in a rather random manor, some some nights but not
others and I realized that my network was not being routed consistently,
since we created multiple paths between the single homed clients and
the mutli-homed server.

I worked with my network manager and was not able to really resolve
the issue and resorted to adding BOTH "curie" and "curieb" to the
.amandahosts files on the clients.

This effectively resolved the issue for us. We have not seen any
additional issue nor have we had time to pursue the issue further.

I realize this is a work-around and not an answer or solution,
but its the best I can currently offer.

                                                good luck,

                                                Brian


On Wed, Aug 31, 2011 at 10:47:15AM -0400, McGraw, Robert P wrote:
The last couple days we have been getting the following errors (see below). 
After some searching on the errors someone mentioned about doing am amcheck on 
these hosts. Our Backups were working fine until a few days ago but I decided 
to run amcheck on the host that had the errors just to be sure. Sure enough the 
amcheck failed on these hosts.

One strange sequence of events was on machine hardy I ran amcheck  and it 
failed. A few minutes later when I ran it again it passed, and a few minutes 
later it failed again. Hummmm

These hosts all run the same version of RedHat, and they are multi-homed; same 
host name but they have different subnet/interfaces. I started wondering if the 
multi-home could be causing the problem.

So I modified my disk list to use the IP address of the interfaces such as the 
following:

hardy        /                   remote-dump-bsd  -1  enet100
100.210.30.54        /                   remote-dump-bsd  -1  enet100
100.210.40.22        /                   remote-dump-bsd  -1  enet100

I ran amcheck on each of the hostnames.

[99][amandabacku@hertz]:~/daily% amcheck -c daily hardy
Amanda Backup Client Hosts Check
--------------------------------
WARNING: hardy: selfcheck request failed: timeout waiting for ACK
Client check: 1 host checked in 30.006 seconds.  1 problem found.
(brought to you by Amanda 3.2.3)
                                                                                
                                                                                
   [100][amandabacku@hertz]:~/daily% amcheck -c daily 100.210.30.54
Amanda Backup Client Hosts Check
--------------------------------
Client check: 1 host checked in 0.104 seconds.  0 problems found.
(brought to you by Amanda 3.2.3)
                                                                                
                                                                                
   [101][amandabacku@hertz]:~/daily% amcheck -c daily 100.210.40.22
Amanda Backup Client Hosts Check
--------------------------------
WARNING: 100.210.40.22: selfcheck request failed: timeout waiting for ACK
Client check: 1 host checked in 30.006 seconds.  1 problem found.
(brought to you by Amanda 3.2.3)

The amanda client on hardy is from the RedHat distrubtion. Just use what was in 
the box.

Not sure why all the sudden I am getting the amcheck error on these machines. 
Network wise nothing has been changed.

This only shows that amcheck works, it does not show that the backup will work.

Some options are to use the IP address in place of the name. Another is to make 
a CNAME for the subnets.

Any comments or suggestions as to what might be going on or am I completely off 
base.

Thanks

Robert


---------------------ERRORS-----------------------------------------
   planner: ERROR Request to bohr failed: timeout waiting for ACK
   planner: ERROR Request to hardy failed: timeout waiting for ACK
   planner: ERROR Request to leibniz failed: timeout waiting for ACK

   banach / lev 0  FAILED [too many dumper retry: [request failed: timeout 
waiting for ACK]]
   banach /boot lev 0  FAILED [too many dumper retry: [request failed: timeout 
waiting for ACK]]
   pythagoras / lev 0  FAILED [too many dumper retry: [request failed: timeout 
waiting for ACK]]
   pythagoras /boot lev 0  FAILED [too many dumper retry: [request failed: 
timeout waiting for ACK]]
   banach / lev 0  FAILED [cannot read header: got 0 bytes instead of 32768]
   banach / lev 0  FAILED [cannot read header: got 0 bytes instead of 32768]
   banach /boot lev 0  FAILED [cannot read header: got 0 bytes instead of 32768]
   banach /boot lev 0  FAILED [cannot read header: got 0 bytes instead of 32768]
   pythagoras / lev 0  FAILED [cannot read header: got 0 bytes instead of 32768]
   pythagoras / lev 0  FAILED [cannot read header: got 0 bytes instead of 32768]
   pythagoras /boot lev 0  FAILED [cannot read header: got 0 bytes instead of 
32768]
   pythagoras /boot lev 0  FAILED [cannot read header: got 0 bytes instead of 
32768]



_____________________________________________________________________
Robert P. McGraw, Jr.
Manager, Computer System                    EMAIL: [email protected]
Purdue University                            ROOM: MATH-807
Department of Mathematics                   PHONE: (765) 494-6055
150 N. University Street
West Lafayette, IN 47907-2067



---
    Brian R Cuttler                 [email protected]
    Computer Systems Support        (v) 518 486-1697
    Wadsworth Center                (f) 518 473-6384
    NYS Department of Health        Help Desk 518 473-0773



IMPORTANT NOTICE: This e-mail and any attachments may contain
confidential or sensitive information which is, or may be, legally
privileged or otherwise protected by law from further disclosure.  It
is intended only for the addressee.  If you received this in error or
from someone who was not authorized to send it to you, please do not
distribute, copy or use it or any attachments.  Please notify the
sender immediately by reply e-mail and delete this from your
system. Thank you for your cooperation.



--
---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology&  Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst

<[email protected]>

---------------

Erdös 4


Reply via email to