I am using a 2.5.0 CVS snapshot of Amanda. My backup stats are a bit screwed and as a result my backup winds up filling up my holding disk. None of that is a problem that should be dealt with here. I just need to force a level 0 or two to get a disk or two back on track. What is a problem though is that when the holding disk does fill up, the backup deadlocks. Here is my process table from last nights backup (it is doing nothing, everybody is waiting for input from somebody else): 3657 ? S 0:00 \_ /bin/sh -c mt compression 0; /usr/sbin/amdump Daily 3660 ? S 0:00 \_ sh /usr/sbin/amdump Daily 3668 ? S 0:00 \_ /usr/lib/amanda/driver Daily 3669 ? S 0:04 \_ taper Daily 3676 ? S 0:03 | \_ taper Daily 3670 ? S 0:10 \_ dumper Daily 3671 ? S 10:48 \_ dumper Daily 3894 ? S 0:09 | \_ /bin/gzip --best 3672 ? S 0:00 \_ dumper Daily 3673 ? S 0:00 \_ dumper Daily 3886 ? S 6:47 \_ chunker Daily 242 ? S 0:00 inetd 3674 ? S 10:44 \_ amandad 3889 ? S 0:00 \_ /usr/lib/amanda/sendbackup 3890 ? S 168:17 \_ /bin/gzip --fast 3891 ? S 3:30 \_ /usr/lib/amanda/sendbackup 3893 ? Z 0:00 | \_ [sh <defunct>] 3892 ? S 0:00 \_ dump 1usf 1048576 - /dev/sdd5 3900 ? S 1:25 \_ dump 1usf 1048576 - /dev/sdd5 3901 ? S 2:42 \_ dump 1usf 1048576 - /dev/sdd5 3902 ? S 2:39 \_ dump 1usf 1048576 - /dev/sdd5 3903 ? S 2:42 \_ dump 1usf 1048576 - /dev/sdd5 Here is the last part of my amdump file: driver: result time 15398.452 from chunker1: RQ-MORE-DISK 01-00016 find diskspace: want 181760 K find diskspace: size 181760 hf 254912 df 254880 da 181760 ha 181792 find diskspace: selected /hd1 free 254912 reserved 181792 dumpers 0 merging holding disk /hd1 to disk localhost:/data, add 181792 for reserved 3817600, left 0 driver: send-cmd time 15398.453 to chunker1: CONTINUE /hd1/20001028/localhost._data.1 1048576 181792 driver: state time 15398.453 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 15398.453 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 15398.453 hdisk 0: free 73120 dumpers 1 driver: state time 15398.454 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 15398.454 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 15398.454 hdisk 0: free 73120 dumpers 1 driver: state time 15398.454 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 15398.454 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 15398.454 hdisk 0: free 73120 dumpers 1 driver: state time 16128.292 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 16128.292 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 16128.292 hdisk 0: free 73120 dumpers 1 driver: result time 16128.317 from chunker1: RQ-MORE-DISK 01-00016 find diskspace: want 190848 K find diskspace: size 190848 hf 73120 df 73088 da 73088 ha 73120 find diskspace: not enough diskspace. Left with 117760 K find diskspace: want 190848 K find diskspace: size 190848 hf 73120 df 73088 da 73088 ha 73120 find diskspace: not enough diskspace. Left with 117760 K driver: send-cmd time 16128.330 to chunker1: ABORT driver: send-cmd time 16128.331 to dumper1: ABORT driver: state time 16128.331 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 16128.331 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 16128.331 hdisk 0: free 73120 dumpers 0 driver: state time 16128.331 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 16128.331 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 16128.331 hdisk 0: free 73120 dumpers 0 driver: state time 16128.332 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 16128.332 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 16128.332 hdisk 0: free 73120 dumpers 0 driver: state time 16128.426 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 16128.426 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 16128.426 hdisk 0: free 73120 dumpers 0 driver: result time 16128.427 from chunker1: RQ-MORE-DISK 01-00016 find diskspace: want 200416 K find diskspace: size 200416 hf 73120 df 73088 da 73088 ha 73120 find diskspace: not enough diskspace. Left with 127328 K find diskspace: want 200416 K find diskspace: size 200416 hf 73120 df 73088 da 73088 ha 73120 find diskspace: not enough diskspace. Left with 127328 K driver: state time 16128.428 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 1 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 16128.428 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 16128.428 hdisk 0: free 73120 dumpers -1 driver: state time 16128.428 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 1 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 16128.428 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 16128.428 hdisk 0: free 73120 dumpers -1 driver: state time 16128.428 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 qlen tapeq: 0 runq: 0 roomq: 1 wakeup: 0 driver-idle: no-dumpers driver: interface-state time 16128.428 if : free 10000 if ETH1: free 10000 if LOCAL: free 997 driver: hdisk-state time 16128.428 hdisk 0: free 73120 dumpers -1 and that's it. You can see that dumper1 and chunker1 were sent ABORT commands although according to the process list they both seem to still be alive. chunker is still waiting for input on fd0: # strace -f -p 3886 strace -f -p 3886 read(0, <unfinished ...> dumper is blocked on all of it's processes: # strace -f -p 3673 strace -f -p 3673 read(0, <unfinished ...> # strace -f -p 3677 strace -f -p 3677 attach: ptrace(PTRACE_ATTACH, ...): No such process # strace -f -p 3672 strace -f -p 3672 read(0, <unfinished ...> # strace -f -p 3671 strace -f -p 3671 write(3, " .\247\311Qz\222\3\316\377x\250Vn\301\7\rY\217\37bV]\232"..., 13120 <unfinished ...> # strace -f -p 3894 strace -f -p 3894 read(0, <unfinished ...> # strace -f -p 3670 strace -f -p 3670 read(0, <unfinished ...> Now a read of the chunker.c source reveals that ABORT does not really do anything. It sets an "abort_pending" flag but nowhere in chunker.c is it ever checked. dumper.c also sets an "abort_pending" flag when it gets an ABORT command and it does seem to have some code to check the flag, but I guess that is not working properly. Thots? b. abort_pending -- Brian J. Murrell
