So I switched out my backup server to a new host running Ubuntu 18.04
(Bionic Beaver). All the Amanda stuff is working OK except for the
localhost root filesystem dump, which is done direct to tape (no holding
disk). That one hangs for a while, then fails with the error
"sendbackup: critical (fatal): index tee cannot write [Broken pipe]".

The problem appears to be that sendbackup is deadlocked. This is what I
observe:

Thu Apr 12 16:45:18 vectro@cue:~ {1}$ pstree -ap 28078
amdump,28078 /usr/sbin/amdump vectro.org
  └─driver,28084 vectro.org --log-filename 
/var/amanda/vectro.org/logs/log.20180412164328.0
      ├─dumper,28086 vectro.org --log-filename 
/var/amanda/vectro.org/logs/log.20180412164328.0
      │   ├─amandad,28356 -auth=local
      │   │   ├─(amandad,28363)
      │   │   └─sendbackup,28362 amandad local --shm-name 
/amanda_shm_control-28085-0
      │   │       ├─sendbackup,28364 amandad local --shm-name 
/amanda_shm_control-28085-0
      │   │       │   └─sh,28369 -c /bin/tar -tf - 2>/dev/null | sed -e 
's/^\\.//'
      │   │       │       ├─sed,28371 -e s/^\\.//
      │   │       │       └─tar,28370 -tf -
      │   │       ├─tar,28367 --create --file - --directory / --one-file-system 
--listed-incremental /var/lib/amanda/gnutar-lists/localhost__0.new ...
      │   │       │   └─(sh,28372)
      │   │       └─{sendbackup},28368
      │   ├─gzip,28365 --best
      │   └─{dumper},28286
      ├─dumper,28087 vectro.org --log-filename 
/var/amanda/vectro.org/logs/log.20180412164328.0
      ├─dumper,28088 vectro.org --log-filename 
/var/amanda/vectro.org/logs/log.20180412164328.0
      ├─dumper,28089 vectro.org --log-filename 
/var/amanda/vectro.org/logs/log.20180412164328.0
      └─taper,28085 /usr/lib/amanda/taper vectro.org --storage vectro.org 
--log-filename /var/amanda/vectro.org/logs/log.20180412164328.0
          ├─{taper},28351
          └─{taper},28352

The deadlocks are pipe and process based, as follows (I can make a
diagram if needed):

  * Process 28362 waits to read pipe 483972 (fd 0). The other end of
    this pipe is held as fd 2 by pids 28367, 28369, and 28371.
  * Process 28367 waits to write pipe 483946 (fd 1). The other end of
    this pipe is held as fd 1 by pid 28364.
  * Process 28364 waits to write pipe 483943 (fd 3). The other end of
    this pipe is held as fd 4 by pid 28362.
  * Process 28369 is just waiting to reap its children (28371 and 28370).
  * Process 28371 waits to read pipe 485744 (fd 0). The other end of
    this pipe is held as fd 1 by pid 28370.
  * Process 28370 waits to read pipe 485743 (fd 0). The other end of
    this pipe is held as fd 5 by pid 28364.

Here are backtraces for the sendbackup processes:

Attaching to process 28362
0x00007fd7116a9384 in __libc_read (fd=0, buf=0x55d8aa4c7090,
nbytes=8192) at ../sysdeps/unix/sysv/linux/read.c:27
27      ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
#0  0x00007fd7116a9384 in __libc_read (fd=0, buf=0x55d8aa4c7090,
nbytes=8192) at ../sysdeps/unix/sysv/linux/read.c:27
#1  0x00007fd711c073c3 in debug_areads () from
/usr/lib/x86_64-linux-gnu/amanda/libamanda-3.5.1.so
#2  0x000055d8a8be12d3 in parse_backup_messages ()
#3  0x000055d8a8bde591 in main ()
Detaching from program: /usr/lib/amanda/sendbackup, process 28362

Attaching to process 28364
0x00007fd7116a9281 in __libc_write (fd=3, buf=0x7fff8c6b19d0,
nbytes=8192) at ../sysdeps/unix/sysv/linux/write.c:27
27      ../sysdeps/unix/sysv/linux/write.c: No such file or directory.
#0  0x00007fd7116a9281 in __libc_write (fd=3, buf=0x7fff8c6b19d0,
nbytes=8192) at ../sysdeps/unix/sysv/linux/write.c:27
#1  0x00007fd711c27556 in safe_write () from
/usr/lib/x86_64-linux-gnu/amanda/libamanda-3.5.1.so
#2  0x00007fd711c26d7e in full_write () from
/usr/lib/x86_64-linux-gnu/amanda/libamanda-3.5.1.so
#3  0x000055d8a8be1895 in start_index ()
#4  0x000055d8a8be2c29 in ?? ()
#5  0x000055d8a8bde56d in main ()
Detaching from program: /usr/lib/amanda/sendbackup, process 28364

This is using the Amanda packages included with Ubuntu 18.04, which is
Amanda 3.5.1.

Any thoughts on how to troubleshoot/fix this?

Cheers,

Ian

Reply via email to