I am using a 2.5.0 CVS snapshot of Amanda.  My backup stats are a bit screwed and as a 
result my backup winds up filling up my holding disk.  None of that is a problem that 
should be dealt with here.  I just need to force a level 0 or two to get a disk or two 
back on track.

What is a problem though is that when the holding disk does fill up, the
backup deadlocks.  Here is my process table from last nights backup (it
is doing nothing, everybody is waiting for input from somebody else):

 3657 ?        S      0:00      \_ /bin/sh -c mt compression 0; /usr/sbin/amdump Daily
 3660 ?        S      0:00          \_ sh /usr/sbin/amdump Daily
 3668 ?        S      0:00              \_ /usr/lib/amanda/driver Daily
 3669 ?        S      0:04                  \_ taper Daily
 3676 ?        S      0:03                  |   \_ taper Daily
 3670 ?        S      0:10                  \_ dumper Daily
 3671 ?        S     10:48                  \_ dumper Daily
 3894 ?        S      0:09                  |   \_ /bin/gzip --best
 3672 ?        S      0:00                  \_ dumper Daily
 3673 ?        S      0:00                  \_ dumper Daily
 3886 ?        S      6:47                  \_ chunker Daily
  242 ?        S      0:00 inetd
 3674 ?        S     10:44  \_ amandad
 3889 ?        S      0:00      \_ /usr/lib/amanda/sendbackup
 3890 ?        S    168:17          \_ /bin/gzip --fast
 3891 ?        S      3:30          \_ /usr/lib/amanda/sendbackup
 3893 ?        Z      0:00          |   \_ [sh <defunct>]
 3892 ?        S      0:00          \_ dump 1usf 1048576 - /dev/sdd5
 3900 ?        S      1:25              \_ dump 1usf 1048576 - /dev/sdd5
 3901 ?        S      2:42                  \_ dump 1usf 1048576 - /dev/sdd5
 3902 ?        S      2:39                  \_ dump 1usf 1048576 - /dev/sdd5
 3903 ?        S      2:42                  \_ dump 1usf 1048576 - /dev/sdd5

Here is the last part of my amdump file:

driver: result time 15398.452 from chunker1: RQ-MORE-DISK 01-00016
find diskspace: want 181760 K
find diskspace: size 181760 hf 254912 df 254880 da 181760 ha 181792
find diskspace: selected /hd1 free 254912 reserved 181792 dumpers 0
merging holding disk /hd1 to disk localhost:/data, add 181792 for reserved 3817600, 
left 0
driver: send-cmd time 15398.453 to chunker1: CONTINUE /hd1/20001028/localhost._data.1 
1048576 181792
driver: state time 15398.453 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 15398.453 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 15398.453 hdisk 0: free 73120 dumpers 1
driver: state time 15398.454 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 15398.454 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 15398.454 hdisk 0: free 73120 dumpers 1
driver: state time 15398.454 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 15398.454 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 15398.454 hdisk 0: free 73120 dumpers 1
driver: state time 16128.292 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 16128.292 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 16128.292 hdisk 0: free 73120 dumpers 1
driver: result time 16128.317 from chunker1: RQ-MORE-DISK 01-00016
find diskspace: want 190848 K
find diskspace: size 190848 hf 73120 df 73088 da 73088 ha 73120
find diskspace: not enough diskspace. Left with 117760 K
find diskspace: want 190848 K
find diskspace: size 190848 hf 73120 df 73088 da 73088 ha 73120
find diskspace: not enough diskspace. Left with 117760 K
driver: send-cmd time 16128.330 to chunker1: ABORT
driver: send-cmd time 16128.331 to dumper1: ABORT
driver: state time 16128.331 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 16128.331 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 16128.331 hdisk 0: free 73120 dumpers 0
driver: state time 16128.331 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 16128.331 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 16128.331 hdisk 0: free 73120 dumpers 0
driver: state time 16128.332 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 16128.332 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 16128.332 hdisk 0: free 73120 dumpers 0
driver: state time 16128.426 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 0 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 16128.426 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 16128.426 hdisk 0: free 73120 dumpers 0
driver: result time 16128.427 from chunker1: RQ-MORE-DISK 01-00016
find diskspace: want 200416 K
find diskspace: size 200416 hf 73120 df 73088 da 73088 ha 73120
find diskspace: not enough diskspace. Left with 127328 K
find diskspace: want 200416 K
find diskspace: size 200416 hf 73120 df 73088 da 73088 ha 73120
find diskspace: not enough diskspace. Left with 127328 K
driver: state time 16128.428 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 1 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 16128.428 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 16128.428 hdisk 0: free 73120 dumpers -1
driver: state time 16128.428 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 1 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 16128.428 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 16128.428 hdisk 0: free 73120 dumpers -1
driver: state time 16128.428 free kps: 2397 space: 73120 taper: idle idle-dumpers: 3 
qlen tapeq: 0 runq: 0 roomq: 1 wakeup: 0 driver-idle: no-dumpers
driver: interface-state time 16128.428 if : free 10000 if ETH1: free 10000 if LOCAL: 
free 997
driver: hdisk-state time 16128.428 hdisk 0: free 73120 dumpers -1

and that's it.  You can see that dumper1 and chunker1 were sent ABORT
commands although according to the process list they both seem to still
be alive.

chunker is still waiting for input on fd0:

# strace -f -p 3886
strace -f -p 3886
read(0,  <unfinished ...>

dumper is blocked on all of it's processes:

# strace -f -p 3673
strace -f -p 3673
read(0,  <unfinished ...>
# strace -f -p 3677
strace -f -p 3677
attach: ptrace(PTRACE_ATTACH, ...): No such process
# strace -f -p 3672
strace -f -p 3672
read(0,  <unfinished ...>
# strace -f -p 3671
strace -f -p 3671
write(3, " .\247\311Qz\222\3\316\377x\250Vn\301\7\rY\217\37bV]\232"..., 13120 
<unfinished ...>
# strace -f -p 3894
strace -f -p 3894
read(0,  <unfinished ...>
# strace -f -p 3670
strace -f -p 3670
read(0,  <unfinished ...>

Now a read of the chunker.c source reveals that ABORT does not really do
anything.  It sets an "abort_pending" flag but nowhere in chunker.c is it
ever checked.  dumper.c also sets an "abort_pending" flag when it gets an
ABORT command and it does seem to have some code to check the flag, but I
guess that is not working properly.

Thots?

b.

abort_pending


-- 
Brian J. Murrell

Reply via email to