ARGH, this is awkward, I just figured out what was calling amcheck on Feb 14 12:00. One of our cronjobs that is testing whether a new tape for the night has to be inserted.

Didn't expect amcheck to use the same interactivity Mail-Reporting and Mail-Template. Case closed.



On 16/02/17 13:11, ts...@portrix-systems.de wrote:
The patches work great, thanks again. I have noticed another issue
regarding "device busy" though:

During backup, after the first tape change, I get more "AMANDA VOLUME
REQUEST" mails, even though the tape is still beeing written to and far
from beeing full.

I don't think the patches have caused this, I have seen it before. It is
quite difficult to reproduce and I am thinking about setting up a test
system with a virtual Tape Device like mhvtl (never tried it before) in
order to test it reliably. Testing a potential patch will take some
time, too.

Right now I believe it is only happening when amanda is still dumping to
the holding disk after the first tape change. And it is caused by
something calling amcheck while writing to tape is in progress.

Normally I would leave the backup running over night and the next day
when I do the first tape change, only writing to tape is left to do. I
don't get erroneous mails in that case.

The erroneous mails look like this:

Subject: AMANDA VOLUME REQUEST: new volume

Can't open tape device /dev/tape/by-id/scsi-3500110a00132b8f2-nst:
Device or resource busy

Insert a new volume in drive_daily
or write the name of a new changer in '/tmp/input_daily'
or write 'abort' in the file to abort the scan.

In the debug log dir I can find (mail arrived at 12:00):

amcheck-device.20170214120001.debug:

Tue Feb 14 12:00:01.472364882 2017: pid 30687: thd-0x1357e10:
amcheck-device: pid 30687 ruid 500 euid 500 version 3.4.2: start at
Tue Feb 14 12:00:01 2017
Tue Feb 14 12:00:01.472544844 2017: pid 30687: thd-0x1357e10:
amcheck-device: reading config file
/usr/local/amandatest/etc/amanda/ptxhamdaily/amanda.conf
Tue Feb 14 12:00:01.483178644 2017: pid 30687: thd-0x1357e10:
amcheck-device: pid 30687 ruid 500 euid 500 version 3.4.2: rename at
Tue Feb 14 12:00:01 2017
Tue Feb 14 12:00:01.528612782 2017: pid 30687: thd-0x1357e10:
amcheck-device: Device tape:/dev/tape/by-id/scsi-3500110a00132b8f2-nst
error = 'Can't open tape device
/dev/tape/by-id/scsi-3500110a00132b8f2-nst: Device or resource busy'
Tue Feb 14 12:00:01.528639498 2017: pid 30687: thd-0x1357e10:
amcheck-device: Device tape:/dev/tape/by-id/scsi-3500110a00132b8f2-nst
setting status flag(s): DEVICE_STATUS_DEVICE_BUSY
Tue Feb 14 12:00:01.529026195 2017: pid 30687: thd-0x1357e10:
amcheck-device: new Amanda::Changer::Error: type='fatal',
message='Can't open tape device
/dev/tape/by-id/scsi-3500110a00132b8f2-nst: Device or resource busy'
Tue Feb 14 12:00:01.529453221 2017: pid 30687: thd-0x1357e10:
amcheck-device: cmd: /usr/bin/Mail -s AMANDA VOLUME REQUEST: new
volume x...@xxx.xxx


amcheck.20170214120001.debug:

Tue Feb 14 12:00:01.346665049 2017: pid 30684: thd-0x1e9d010: amcheck:
pid 30684 ruid 0 euid 500 version 3.4.2: start at Tue Feb 14 12:00:01
2017
Tue Feb 14 12:00:01.346740802 2017: pid 30684: thd-0x1e9d010: amcheck:
reading config file
/usr/local/amandatest/etc/amanda/ptxhamdaily/amanda.conf
Tue Feb 14 12:00:01.347261333 2017: pid 30684: thd-0x1e9d010: amcheck:
pid 30684 ruid 0 euid 500 version 3.4.2: rename at Tue Feb 14 12:00:01
2017
Tue Feb 14 12:00:01.355625384 2017: pid 30686: thd-0x1e9d010:
amcheck-server: new message: amcheck.c:940:4:2800027 Amanda Tape
Server Host Check
Tue Feb 14 12:00:01.355650277 2017: pid 30686: thd-0x1e9d010:
amcheck-server: Not value for key 'dle_hostname' in message   {
    "source_filename" : "amcheck.c",
    "source_line" : "940",
    "severity" : "message",
    "process" : "amcheck-server",
    "running_on" : "amanda-server",
    "component" : "amanda",
    "module" : "amanda",
    "code" : "2800027",
    "message" : "Amanda Tape Server Host Check"
  }
Tue Feb 14 12:00:01.356030840 2017: pid 30686: thd-0x1e9d010:
amcheck-server: Spawning
"/usr/local/amandatest/libexec/amanda/amcheck-device
/usr/local/amandatest/libexec/amanda/amcheck-device ptxhamdaily
ptxhamdaily" in pipeline
Tue Feb 14 12:00:01.528935767 2017: pid 30686: thd-0x1e9d010:
amcheck-server: new message: amcheck.c:867:4:123 slot 1: Can't open
tape device /dev/tape/by-id/scsi-3500110a00132b8f2-nst: Device or
resource busy
Tue Feb 14 12:00:01.528954799 2017: pid 30686: thd-0x1e9d010:
amcheck-server: Not value for key 'dle_hostname' in message ,
  {
    "source_filename" : "amcheck.c",
    "source_line" : "867",
    "severity" : "message",
    "process" : "amcheck-server",
    "running_on" : "amanda-server",
    "component" : "amanda",
    "module" : "amanda",
    "code" : "123",
    "errstr" : "slot 1: Can't open tape device
/dev/tape/by-id/scsi-3500110a00132b8f2-nst: Device or resource busy",
    "message" : "slot 1: Can't open tape device
/dev/tape/by-id/scsi-3500110a00132b8f2-nst: Device or resource busy"
  }

Please let me know if more info is required.



On 10/02/17 17:25, Jean-Louis Martineau wrote:
I committed both patches.

Let me know if you have other issues or if it succeed!

Jean-Louis

On 10/02/17 09:37 AM, ts...@portrix-systems.de
<mailto:ts...@portrix-systems.de> wrote:
A quick test with amflush on previous holding-disk content didn't show
the issue anymore. Tape was ejected, no more locking, flushing was
able to finish both tapes. Will try a real backup next week.

Thanks alot,
Thilo


On 09/02/17 17:33, Jean-Louis Martineau wrote:
This patch is also needed.

Please try them and let me know if they works.

Jean-Louis

On 09/02/17 10:58 AM, Jean-Louis Martineau wrote:
Hi,

Can you try the attached patch.

Jean-Louis

On 08/02/17 12:58 PM, ts...@portrix-systems.de
<mailto:ts...@portrix-systems.de>
<mailto:ts...@portrix-systems.de> wrote:
> Hi,
>
> tried this with:
>

http://www.zmanda.com/downloads/community/Amanda/3.4.2/Redhat_Enterprise_6.0/amanda-backup_server-3.4.2-1.rhel6.x86_64.rpm


> (and also 3.4.1)
>
> Oracle Linux Server release 6.8
> # uname -rsv
> Linux 4.1.12-61.1.23.el6uek.x86_64 #2 SMP Tue Dec 20 16:51:41 PST
2016
>
>
> Tapeinfo:
> Vendor ID: 'HP '
> Product ID: 'Ultrium 4-SCSI '
> Revision: 'H63Z'
>
>
> The test backup doesn't fit on one tape. After I get the E-Mail from
> the interactivity module, I can see with lsof that the taper still
has
> a lock on /dev/nst1:
>
> > # lsof /dev/nst1
>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
>> taper 1168 amandabackup 5u CHR 9,129 0t0 1812371 /dev/nst1
>
> touching the check-file looks like this in
> /var/log/amanda/server/ptxhamtest/taper.20170208122530.debug:
>
>> Wed Feb 08 16:21:12.035824026 2017: pid 1168: thd-0x240ad70: taper:
>> xfer_queue_message: MSG: <XMsg@0x7ff8bc012390 type=XMSG_PART_DONE
>> elt=<XferDestTaperSplitter@0x24090a0> version=0>
>> Wed Feb 08 16:21:12.036846793 2017: pid 1168: thd-0x240ad70: taper:
>> Building type SPLIT_FILE header of 32768-32768 bytes with
>> name='localhost' disk='/amanda/testbackupsrc' dumplevel=0 and
>> blocksize=32768
>> Wed Feb 08 16:22:52.113249836 2017: pid 1168: thd-0x240ad70: taper:
>> Device tape:/dev/nst1 error = 'No space left on device'
>> Wed Feb 08 16:22:52.113278102 2017: pid 1168: thd-0x240ad70: taper:
>> Device tape:/dev/nst1 setting status flag(s):
DEVICE_STATUS_VOLUME_ERROR
>> Wed Feb 08 16:22:53.101250403 2017: pid 1168: thd-0x240ad70: taper:
>> xfer_queue_message: MSG: <XMsg@0x7ff8bc012670 type=XMSG_PART_DONE
>> elt=<XferDestTaperSplitter@0x24090a0> version=0>
>> Wed Feb 08 16:23:19.976805822 2017: pid 1168: thd-0x1454d20: taper:
>> Will request retry of failed split part.
>> Wed Feb 08 16:23:19.977170240 2017: pid 1168: thd-0x1454d20: taper:
>> tape HAM-M-029-L4 kb 754974720 fm 73 [OK]
>> Wed Feb 08 16:23:19.980883703 2017: pid 1168: thd-0x1454d20: taper:
>> Device is in variable block size
>> Wed Feb 08 16:23:40.882859608 2017: pid 1168: thd-0x1454d20: taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Wed Feb 08 16:23:40.882974991 2017: pid 1168: thd-0x1454d20: taper:
>> cmd: /usr/bin/Mail -s AMANDA VOLUME REQUEST: new volume
x...@xxx.xxx <mailto:x...@xxx.xxx>
<mailto:x...@xxx.xxx>
>>
>> Wed Feb 08 16:23:50.888937917 2017: pid 1168: thd-0x1454d20: taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Wed Feb 08 16:24:00.960351367 2017: pid 1168: thd-0x1454d20: taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Wed Feb 08 16:24:10.960959061 2017: pid 1168: thd-0x1454d20: taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
> ...
>> Wed Feb 08 17:20:55.510368422 2017: pid 1168: thd-0x1454d20: taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Wed Feb 08 17:21:05.512939917 2017: pid 1168: thd-0x1454d20: taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Wed Feb 08 17:21:15.515187884 2017: pid 1168: thd-0x1454d20: taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Wed Feb 08 17:21:25.517437824 2017: pid 1168: thd-0x1454d20: taper:
>> Device tape:/dev/nst1 error = 'Can't open tape device /dev/nst1:
>> Device or resource busy'
>> Wed Feb 08 17:21:25.517455484 2017: pid 1168: thd-0x1454d20: taper:
>> Device tape:/dev/nst1 setting status flag(s):
DEVICE_STATUS_DEVICE_BUSY
>> Wed Feb 08 17:21:25.517680908 2017: pid 1168: thd-0x1454d20: taper:
>> Slot 1 without label can be labeled
>> Wed Feb 08 17:21:25.517775016 2017: pid 1168: thd-0x1454d20: taper:
>> new Amanda::Changer::Error: type='fatal', message='Can't open tape
>> device /dev/nst1: Device or resource busy'
>> Wed Feb 08 17:21:25.518772880 2017: pid 1168: thd-0x1454d20: taper:
>> cmd: /usr/bin/Mail -s AMANDA VOLUME REQUEST: new volume
x...@xxx.xxx <mailto:x...@xxx.xxx>
<mailto:x...@xxx.xxx>
>
>
>
>
> if I remove "eject-volume yes" from amanda.conf, lsof doesn't show
> /dev/nst1 as open after the mail and logfile looks like this when
> touching the check-file:
>
>> Sun Feb 05 22:01:45.892621331 2017: pid 14498: thd-0x36592a0:
taper:
>> xfer_queue_message: MSG: <XMsg@0x7f5770012690 type=XMSG_PART_DONE
>> elt=<XferDestTaperSplitter@0x3226530> version=0>
>> Sun Feb 05 22:01:45.893809646 2017: pid 14498: thd-0x36592a0:
taper:
>> Building type SPLIT_FILE header of 32768-32768 bytes with
>> name='localhost' disk='abc-rman' dumplevel=0 and blocksize=32768
>> Sun Feb 05 22:02:27.050993044 2017: pid 14498: thd-0x36592a0:
taper:
>> Device tape:/dev/nst1 error = 'No space left on device'
>> Sun Feb 05 22:02:27.051019886 2017: pid 14498: thd-0x36592a0:
taper:
>> Device tape:/dev/nst1 setting status flag(s):
DEVICE_STATUS_VOLUME_ERROR
>> Sun Feb 05 22:02:28.006847900 2017: pid 14498: thd-0x36592a0:
taper:
>> xfer_queue_message: MSG: <XMsg@0x7f5770016d70 type=XMSG_PART_DONE
>> elt=<XferDestTaperSplitter@0x3226530> version=0>
>> Sun Feb 05 22:02:54.991253275 2017: pid 14498: thd-0x2258610:
taper:
>> Will request retry of failed split part.
>> Sun Feb 05 22:02:54.992089576 2017: pid 14498: thd-0x2258610:
taper:
>> tape HAM-M-025-L4 kb 804625975 fm 137 [OK]
>> Sun Feb 05 22:02:54.993163873 2017: pid 14498: thd-0x2258610:
taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Sun Feb 05 22:02:54.993302419 2017: pid 14498: thd-0x2258610:
taper:
>> cmd: /usr/bin/Mail -s AMANDA VOLUME REQUEST: new volume
x...@xxx.xxx <mailto:x...@xxx.xxx>
<mailto:x...@xxx.xxx>
>>
>> Sun Feb 05 22:03:04.998826673 2017: pid 14498: thd-0x2258610:
taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Sun Feb 05 22:03:15.005238432 2017: pid 14498: thd-0x2258610:
taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Sun Feb 05 22:03:25.009766390 2017: pid 14498: thd-0x2258610:
taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
> ...
>> Mon Feb 06 09:13:13.437584985 2017: pid 14498: thd-0x2258610:
taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Mon Feb 06 09:13:23.455771038 2017: pid 14498: thd-0x2258610:
taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Mon Feb 06 09:13:33.473860747 2017: pid 14498: thd-0x2258610:
taper:
>> new Amanda::Changer::Error: type='failed', reason='notfound',
>> message='No acceptable volumes found'
>> Mon Feb 06 09:13:38.596670849 2017: pid 14498: thd-0x2258610:
taper:
>> Device is in variable block size
>> Mon Feb 06 09:13:40.734257638 2017: pid 14498: thd-0x2258610:
taper:
>> Slot 1 with label HAM-M-026-L4 is usable
>> Mon Feb 06 09:13:45.889592195 2017: pid 14498: thd-0x2258610:
taper:
>> Building type TAPESTART header of 32768-32768 bytes with
>> name='HAM-M-026-L4' disk='' dumplevel=0 and blocksize=32768
>
>
> Thanks,
> Thilo


Reply via email to