Hello,
Volker Dierks wrote:
Usually, I'd see if the problem can be reproduced with the existing
system setup. If that's possible, I'd first check if the actual cause
might be purely SCSI device related.
That's what I'm going to do first. I'll create the second pool again
(with the same tapes) and put all nodes into that pool ...
I've done this tonight .. in turn:
- the backup up started as planned on drive two with the same tape as
Thursday (the tape was already mounted so no mtx stuff take place)
- after some minutes (and 500 MB written data on that tape) everything
hangs again .. so I restarted everything and disabled that tape
- I mounted the next tape and started the backup again. After 7 GB of
written data to that tape (and 5 successful backuped nodes) I got to
bed.
Until here, it lookes like the problems were truly caused by the tape.
But this morning I got the following mail:
12-Dec 03:24 mw-mcs-sd: nfs-1.2005-12-12_02.15.08 Error: block.c:538 Write error at
12:5438 on device "Drive-2" (/dev/nst1). ERR=Input/output error.
12-Dec 03:24 mw-mcs-sd: nfs-1.2005-12-12_02.15.08 Error: Error writing final
EOF to tape. This Volume may not be readable.
dev.c:1553 ioctl MTWEOF error on "Drive-2" (/dev/nst1). ERR=No such device or
address.
12-Dec 03:24 mw-mcs-sd: End of medium on Volume "MW-MCS-1-12"
Bytes=7,078,064,979 Blocks=109,722 at 12-Dec-2005 03:24.
12-Dec 03:24 mw-mcs-sd: 3301 Issuing autochanger "loaded drive 1" command.
12-Dec 03:24 mw-mcs-sd: 3302 Autochanger "loaded drive 1", result is Slot 12.
12-Dec 04:10 mw-mcs-sd: 3307 Issuing autochanger "unload slot 12, drive 1"
command.
12-Dec 04:14 mw-mcs-sd: 3995 Bad autochanger "unload slot 13, drive 1":
ERR=Child died from signal 15: Termination.
12-Dec 04:14 mw-mcs-sd: Please mount Volume "MW-MCS-1-13" on Storage Device
"Drive-2" (/dev/nst1) for Job nfs-1.2005-12-12_02.15.08
12-Dec 05:14 mw-mcs-sd: Please mount Volume "MW-MCS-1-13" on Storage Device
"Drive-2" (/dev/nst1) for Job nfs-1.2005-12-12_02.15.08
12-Dec 07:14 mw-mcs-sd: Please mount Volume "MW-MCS-1-13" on Storage Device
"Drive-2" (/dev/nst1) for Job nfs-1.2005-12-12_02.15.08
12-Dec 08:59 nfs-1-fd: nfs-1.2005-12-12_02.15.08 Fatal error: backup.c:498
Network send error to SD. ERR=Broken pipe
12-Dec 08:59 mw-mcs-dir: nfs-1.2005-12-12_02.15.08 Error: Bacula 1.38.2
(20Nov05): 12-Dec-2005 08:59:32
At 08:59 I stopped bacula-dir and -sd. The kernel-Log contains the
same SCSI ABORT messages as posted before starting at 02:54:
Dec 12 02:54:30 backup kernel: scsi1:0:5:0: Attempting to queue an ABORT message
The last thing I can imagine is: All tapes which were used in Drive-2
up to now are previously used (by amanda). This is the way I recycled
them:
mt -f /dev/nst1 rewind
mt -f /dev/nst1 setdensity 0x89
mt -f /dev/nst1 rewind
mt -f /dev/nst1 weof
mt -f /dev/nst1 weof
write the Bacula label
Perhaps this is not the right way? I've attached our configartion and
would be very thankful, if someone can confirm that it's correct. It's
the one drive configuration pointing to Pool: DRIVE-2. When using this
configuration against Pool: DRIVE-1 (all tapes in this pool are fresh
new ones) everything is working fine.
Volker
PS: I'm running "mt -f /dev/nst1 erase" on MW-MCS-1-12 atm. If this
fails, I would say that drive two is faulty.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users