I think I see the very same behavior that Lalo cites. As reported by
amcheck, clients inconsistently time out, i.e. in one amcheck but not in
another a minute or so later. Time out might happen for one or two of
several partitions on a client, but disappear with the next amcheck. The
checks are being run on nearly idle hosts with a very quiet LAN. From
looking frequently at amstatus output during amdump, it appears that the
dumpers retry failed connections at least once, and often (but not always)
succeed on the retry.

It would be great to know the fix for this behavior...

R. Becker


On Tue, 4 Jun 2002, Lalo Castro wrote:

> Hi,
>       I am running Amanda 2.4.2p2 on a server (RH Linux 7.2) and 3 clients (2X
> FreeBSD 4.3 and 1 OpenBSD 2.9).
>       My problem is time-outs.  These time-outs are not in any pattern that I
> can see.  Only one machine has ipfw enabled and the firewall has been
> opened for Amanda.
>       Sometimes it is an entire client that times out.  More often it is a
> single partition, though 2 or 3 partitions timed out are not unusual.
> Sometimes it is a partition on one client, sometimes on another, and
> sometimes on more than one client.
>       The amanda dump service is started by the servers cron job so the timing
> will be the same for each dump.  This makes me think that the problem is
> not clients timing out because of variable heavy processor load.
>       When I compiled Amanda, I included the portrange option.  Server and
> client have the same portrange.
>       The 24 Gb limit with Amanda has not been hit yet.  All clients' Hard
> drives added up is near 21 Gb, with about 40% average of that disk space
> full.  Amanda does not do a full dump each time.
>       I have tried increasing the timeout time in amanda.conf.  I have also
> increased the bandwidth allowed to Amanda.
>       Checking the logs on client and server, it seems that the clients
> sendbackup services' stream_accept waits and waits for the server to
> respond, and receives no response.
>       Is there anything I can do to get rid of this?  Is there some bottleneck
> I need to be aware of that's timing out?  How can I find it out?
>       Thanks in Advance,
>                       Lalo Castro
>
> Clients' sendbackup stream_accept timeout:
>
> sendbackup: try_socksize: send buffer size is 65536
> sendbackup: stream_server: waiting for connection: 0.0.0.0.10084
> sendbackup: stream_server: waiting for connection: 0.0.0.0.10085
>    waiting for connect on 10084, then 10085
> sendbackup: stream_accept: timeout after 30 seconds
> sendbackup: timeout on data port 10084
> sendbackup: stream_accept: timeout after 30 seconds
> sendbackup: timeout on mesg port 10085
> sendbackup: pid 63381 finish time Fri May 31 14:19:18 2002
>
> Amanda Servers' log in /var/lib/amanda at same time:
>
> START planner date 20020604
> WARNING planner Last full dump of Client1.ucsc.edu:/dev/da2s1e on tape
> overwritten in 1 run.
> START driver date 20020604
> WARNING driver WARNING: /usr/amanda/tmp: 972800 KB requested, but only
> 804736 KB available.
> FINISH planner date 20020604
> STATS driver startup time 721.647
> SUCCESS dumper Client1.ucsc.edu /dev/da2s1e 20020604 2 [sec 137.657 kb
> 45216 kps 328.5 orig-kb 54999]
> SUCCESS dumper Client1.ucsc.edu /dev/da3s1e 20020604 1 [sec 14.887 kb 32
> kps 2.1 orig-kb 543]
> FAIL driver Client2.ucsc.edu /dev/da0s1f 1 [could not connect to
> Client2.ucsc.edu]
> SUCCESS dumper Client1.ucsc.edu /dev/da3s1f 20020604 2 [sec 37.381 kb
> 2432 kps 65.1 orig-kb 43068]
> FAIL driver Client2.library.ucsc.edu /dev/da1s1e 3 [could not connect to
> Client2.ucsc.edu]
> SUCCESS dumper Client2.library.ucsc.edu /dev/da0s1a 20020604 1 [sec
> 3.924 kb 64 kps 16.3 orig-kb 1095]
> SUCCESS dumper Client3.library.ucsc.edu /dev/wd0a 20020604 2 [sec
> 431.894 kb 29152 kps 67.5 orig-kb 231258]
> FINISH driver date 20020604 time 1153.668
>
> The rest is excerpts from Servers' amanda.conf:
>
> inparallel 10           # maximum dumpers that will run in parallel
> netusage  1800 Kbps     # maximum net bandwidth for Amanda, in KB per sec
>
> dumpcycle 1 week        # the number of days in the normal dump cycle
> runspercycle 5    # the number of amdump runs in dumpcycle days
> tapecycle 10 tapes      # the number of tapes in rotation
>
> bumpsize 20 Mb          # minimum savings (threshold) to bump level 1 -> 2
> bumpdays 1              # minimum days at each level
> bumpmult 4              # threshold = bumpsize * bumpmult^(level-1)
>
> etimeout 120            # number of seconds per filesystem for
> dtimeout 180
> ctimeout 160
>
> define dumptype main_back-up {
>          global
>          comment "Backup for the main library"
>          compress client best
>          priority high
> }
>
> define dumptype back-up {
>          global
>          comment "Backup for other clients"
>          compress client best
> }
>
> define interface eth0 {
>      comment "10 Mbps ethernet"
>      use 400 kbps
> }
>
>


Reply via email to