Here is the scenario. I have a cluster of machines at home thta I've had
Amanda working on for years. I even had to go through the protocol
conversion. Ot's been working well for all these years. 

Last weelend I decided to move the Amanda tape/index server from an HP-UX
10.20 9000/835 to and Athalon 1.2GHZ machien with a 400G holding disk.

There is one machien that I can't get to accept authnetiactian yet (an NIS
problem I think), but other than that amcheck runs fine. I _do_ have some
data size vs tape size issues since one of the things that drove me to do
this was the addition of 2 more 40G drives that are over 50% full, but I
think I've got that under control, as I am adding more tapes to the
tapecyclem and double dumpcycle.

However, over and above those issues, I'm seeing _a lot_ of srnage
failures. Some of these are on machines that I have yet to get 2.43.B4 to
compile on, but the example below is from a Debina GNU Linux machine that
_is_ runing 2.4.3B4:


sendbackup: debug 1 pid 8969 ruid 106 euid 106: start at Sat Jan 18 14:01:25 2003
/usr/local/amanda/libexec/sendbackup: version 2.4.3b4
  parsed request as: program `DUMP'
                     disk `hda1'
                     device `hda1'
                     level 2
                     since 2003:1:18:12:50:58
                     options `|;auth=bsd;compress-best;index;'
sendbackup: try_socksize: send buffer size is 65536
sendbackup: time 0.000: stream_server: waiting for connection: 0.0.0.0.32793
sendbackup: time 0.000: stream_server: waiting for connection: 0.0.0.0.32794
sendbackup: time 0.000: stream_server: waiting for connection: 0.0.0.0.32795
sendbackup: time 0.000: waiting for connect on 32793, then 32794, then 32795
sendbackup: time 0.002: stream_accept: connection from 205.159.77.224.2575
sendbackup: time 0.003: stream_accept: connection from 205.159.77.224.2576
sendbackup: time 0.004: stream_accept: connection from 205.159.77.224.2577
sendbackup: time 0.004: got all connections
sendbackup: time 0.004: spawning /bin/gzip in pipeline
sendbackup: argument list: /bin/gzip --best
sendbackup-dump: time 0.005: pid 8971: /bin/gzip --best
sendbackup: time 0.061: spawning /sbin/dump in pipeline
sendbackup: argument list: dump 2usf 1048576 - /dev/hda1
sendbackup: time 0.078: started index creator: "/sbin/restore -tvf - 2>&1 | sed -e '
s/^leaf[        ]*[0-9]*[       ]*\.//
t
/^dir[  ]/ {
s/^dir[         ]*[0-9]*[       ]*\.//
s%$%/%
t
}
d
'"
sendbackup: time 0.088:  91:  normal(|):   DUMP: Date of this level 2 dump: Sat Jan 18 
14:01:25 2003
sendbackup: time 0.089:  91:  normal(|):   DUMP: Date of last level 1 dump: Sat Jan 18 
07:50:59 2003
sendbackup: time 0.090:  91:  normal(|):   DUMP: Dumping /dev/hda1 (/) to standard 
output
sendbackup: time 0.091:  91:  normal(|):   DUMP: Added inode 7 to exclude list (resize 
inode)
sendbackup: time 0.264:  91:  normal(|):   DUMP: Label: none
sendbackup: time 0.265:  91:  normal(|):   DUMP: mapping (Pass I) [regular files]
sendbackup: time 167.711:  91:  normal(|):   DUMP: mapping (Pass II) [directories]
sendbackup: time 217.670:  91:  normal(|):   DUMP: estimated 124486 tape blocks.
sendbackup: time 217.706:  91:  normal(|):   DUMP: Volume 1 started with block 1 at: 
Sat Jan 18 14:05:03 2003
sendbackup: time 217.845:  91:  normal(|):   DUMP: dumping (Pass III) [directories]
sendbackup: time 218.193:  91:  normal(|):   DUMP: dumping (Pass IV) [regular files]
sendbackup: time 517.247:  91:  normal(|):   DUMP: 52.86% done at 219 kB/s, finished 
in 0:04
sendbackup: time 817.596:  91:  normal(|):   DUMP: 80.75% done at 167 kB/s, finished 
in 0:02
sendbackup: time 943.223: index tee cannot write [Broken pipe]
sendbackup: time 943.223: pid 8972 finish time Sat Jan 18 14:17:09 2003
sendbackup: time 943.829: 112:  normal(|): 
sendbackup: time 943.830: 115: strange(?): gzip: stdout: Connection reset by peer
sendbackup: time 943.831: 115: strange(?): sendbackup: index tee cannot write [Broken 
pipe]
sendbackup: time 943.832:  91:  normal(|):   DUMP: Broken pipe
sendbackup: time 943.833:  91:  normal(|):   DUMP: The ENTIRE dump is aborted.
sendbackup: time 944.265: error [compress returned 1, /sbin/dump returned 3]
sendbackup: time 944.265: pid 8969 finish time Sat Jan 18 14:17:10 2003

Thes failures don't seem localized to any one machien or filesystem. Can
anyone sugest what steps to take to debug this?


-- 
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
                                                -- Benjamin Franklin

Reply via email to