I am running bacula-2.2.0 trying to get it working on a FreeBSD 6.2-RELEASE.
I have an autochanger with 3 drives. I have run the complete btape tests
with success. I am now trying to get multiple jobs running on multiple
tapes simultaneously. Below are scenarios I have documented. It would
appear that:
1. Drive status does not always reflect reality
2. Volume status rarely reflects reality
3. Drive and Volume status do not match up
4. Bacula's need to ask for a tape on a new drive that he
is already using.
That said, I hope there is some glaring wrong I am committing, or at the
least you have heard of these problems before and could offer me some
help...
*********************************************************************
SCENARIO 1
Two jobs are running:
140 | nfs 2007-08-21 07:25:49 Using Drive 1
141 | nfs 2007-08-21 07:25:59 Using Drive 2
However the Device Status DOES NOT MATCH
the Volume status.
Device status:
Autochanger "Autochanger" with devices:
"Drive-0" (/dev/nsa0)
"Drive-1" (/dev/nsa1)
"Drive-2" (/dev/nsa2)
Device "FileStorage" (/database/backup) is not open.
Device "Drive-0" (/dev/nsa0) is not open.
Drive 0 status unknown.
Device "Drive-1" (/dev/nsa1) is mounted with:
Volume: 000006L3
Pool: Tape1
Media type: LTO3
Slot 258 is loaded in drive 1.
Total Bytes=354,428,928 Blocks=5,493 Bytes/block=64,523
Positioned at File=2 Block=0
Device "Drive-2" (/dev/nsa2) is mounted with:
Volume: 000005L3
Pool: Tape1
Media type: LTO3
Slot 259 is loaded in drive 2.
Total Bytes=64,512 Blocks=0 Bytes/block=64,512
Positioned at File=0 Block=1
====
In Use Volume status:
000005L3 on device "Drive-0" (/dev/nsa0)
Reader=0 writers=0 reserved=0
000006L3 on device "Drive-1" (/dev/nsa1)
Reader=0 writers=1 reserved=0
====
Now lets start the third job.....
Bacula knows there is nothing in Drive 0 but he insists on
selecting a tape for that drive that is in Drive 2 even though
the Drive status says the tape is in use and the volume status
(all though WRONG!!) says the tape is IN USE....
21-Aug 07:43 mailbackup-dir: Start Backup JobId 143, Job=nfs.2007-08-21_07.43.21
21-Aug 07:43 mailbackup-dir: Using Device "Drive-0"
21-Aug 07:43 mailbackup-sd: 3301 Issuing autochanger "loaded? drive 0" command.
21-Aug 07:43 mailbackup-sd: 3302 Autochanger "loaded? drive 0", result: nothing
loaded.
21-Aug 07:43 mailbackup-sd: 3304 Issuing autochanger "load slot 12, drive 0"
command.
21-Aug 07:44 mailbackup-sd: nfs.2007-08-21_07.43.21 Fatal error: 3992 Bad
autochanger "load slot 12, drive 0": ERR=Child exite
d with code 1.
Results=chio: /dev/ch0: CHIOMOVE: Input/output error
21-Aug 07:44 nfs: nfs.2007-08-21_07.43.21 Fatal error: job.c:1758 Bad response
to Append Data command. Wanted 3000 OK data
, got 3903 Error append data
21-Aug 07:44 mailbackup-dir: nfs.2007-08-21_07.43.21 Error: Bacula
mailbackup-dir 2.2.0 (08Aug07): 21-Aug-2007 07:44:16
Build OS: i386-unknown-freebsd6.2 freebsd 6.2-RELEASE
JobId: 143
Job: nfs.2007-08-21_07.43.21
Backup Level: Full
Client: "nfs" 2.0.3 (06Mar07)
i386-unknown-openbsd4.1,openbsd,4.1
FileSet: "nfs" 2007-08-10 10:34:22
Pool: "Tape1" (From Job resource)
Storage: "Autochanger" (From Job resource)
Scheduled time: 21-Aug-2007 07:43:16
Start time: 21-Aug-2007 07:43:25
End time: 21-Aug-2007 07:44:16
Elapsed time: 51 secs
Priority: 10
FD Files Written: 0
SD Files Written: 0
FD Bytes Written: 0 (0 B)
SD Bytes Written: 0 (0 B)
Rate: 0.0 KB/s
Software Compression: None
VSS: no
Encryption: no
Volume name(s):
Volume Session Id: 11
Volume Session Time: 1187700921
Last Volume Bytes: 64,512 (64.51 KB)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: Error
SD termination status: Error
Termination: *** Backup Error ***
***************************************************************************
SCENARIO 2
It would appear that Bacula gets confused as to the tapes that are
already mounted if Bacula has been shutdown and restarted and there where
tapes left mounted in the drives. Why?
-Should I be shuting down bacula differently?
-Should I be shutting down postgres first?
Job finishes OK! But when doing status 2 2 there is no Volume name????
Device "Drive-2" (/dev/nsa2) is mounted with:
Volume:
Pool: Tape1
Media type: LTO3
Slot 259 is loaded in drive 2.
Total Bytes=23,804,928 Blocks=368 Bytes/block=64,687
Positioned at File=1 Block=0
2 tapes are loaded in drives. I shutdown sd and restart:
Device status:
Autochanger "Autochanger" with devices:
"Drive-0" (/dev/nsa0)
"Drive-1" (/dev/nsa1)
"Drive-2" (/dev/nsa2)
Device "FileStorage" (/database/backup) is not open.
Device "Drive-0" (/dev/nsa0) is mounted with:
Volume: 000007L3
Pool: *unknown*
Media type: LTO3
Slot 257 is loaded in drive 0.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
Device "Drive-1" (/dev/nsa1) is not open.
Drive 1 status unknown.
Device "Drive-2" (/dev/nsa2) is mounted with:
Volume: 000015L3
Pool: *unknown*
Media type: LTO3
Slot 259 is loaded in drive 2.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
====
In Use Volume status:
000007L3 on device "Drive-0" (/dev/nsa0)
Reader=0 writers=0 reserved=0
000015L3 on device "Drive-2" (/dev/nsa2)
Reader=0 writers=0 reserved=0
Start a job just after restart of sd.
20-Aug 08:53 mailbackup-dir: Start Backup JobId 116,
Job=maildrop0.2007-08-20_08.53.47
20-Aug 08:53 mailbackup-dir: Using Device "Drive-0"
20-Aug 08:53 mailbackup-sd: 3301 Issuing autochanger "loaded? drive 0" command.
20-Aug 08:54 mailbackup-sd: 3302 Autochanger "loaded? drive 0", result: nothing
loaded.
20-Aug 08:54 mailbackup-sd: 3304 Issuing autochanger "load slot 4, drive 0"
command.
20-Aug 08:54 mailbackup-sd: 3305 Autochanger "load slot 4, drive 0", status is
OK.
20-Aug 08:54 mailbackup-sd: 3301 Issuing autochanger "loaded? drive 0" command.
20-Aug 08:55 mailbackup-sd: 3302 Autochanger "loaded? drive 0", result is Slot
257.
20-Aug 08:55 mailbackup-sd: Volume "000004L3" previously written, moving to end
of data.
20-Aug 08:56 mailbackup-sd: maildrop0.2007-08-20_08.53.47 Error: Bacula cannot
write on tape Volume "000004L3" because:
The number of files mismatch! Volume=13 Catalog=14
20-Aug 08:56 mailbackup-sd: Marking Volume "000004L3" in Error in Catalog.
20-Aug 08:58 mailbackup-sd: 3307 Issuing autochanger "unload slot 257, drive 0"
command.
20-Aug 08:58 mailbackup-sd: 3304 Issuing autochanger "load slot 2, drive 0"
command.
20-Aug 08:59 mailbackup-sd: 3305 Autochanger "load slot 2, drive 0", status is
OK.
20-Aug 08:59 mailbackup-sd: 3301 Issuing autochanger "loaded? drive 0" command.
20-Aug 09:00 mailbackup-sd: 3302 Autochanger "loaded? drive 0", result is Slot
257.
20-Aug 09:00 mailbackup-sd: Volume "000007L3" previously written, moving to end
of data.
20-Aug 09:13 mailbackup-sd: Ready to append to end of Volume "000007L3" at
file=133.
20-Aug 09:13 mailbackup-sd: Spooling data ...
SD wants to unload the first drive. Then he wants to load slot 2
into drive 0. Status seems okay.
This "000007L3" tape was running a job that I CANCELLED prior to restarting sd.
Volume "000007L3" previously written, moving to end of data....
Device "Drive-0" (/dev/nsa0) is mounted with:
Volume: 000007L3
Pool: Tape1
Media type: LTO3
Device is being initialized.
Slot 257 is loaded in drive 0.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=13 Block=0
Job starts running okay1
Start a second job....
20-Aug 09:57 mailbackup-dir: Start Backup JobId 120,
Job=maildrop1.2007-08-20_09.57.06
20-Aug 09:57 mailbackup-dir: Using Device "Drive-1"
20-Aug 09:57 mailbackup-sd: 3301 Issuing autochanger "loaded? drive 1" command.
20-Aug 09:57 mailbackup-sd: 3302 Autochanger "loaded? drive 1", result: nothing
loaded.
20-Aug 09:57 mailbackup-sd: 3304 Issuing autochanger "load slot 5, drive 1"
command.
20-Aug 09:58 mailbackup-sd: maildrop1.2007-08-20_09.57.06 Fatal error: 3992 Bad
autochanger "load slot 5, drive 1":
ERR=Child exited with code 1.
Results=chio: /dev/ch0: CHIOMOVE: Input/output error
20-Aug 09:58 maildrop1: maildrop1.2007-08-20_09.57.06 Fatal error: job.c:1758
Bad response to Append Data command. Wanted
3000 OK data
, got 3903 Error append data
Fatal error: job.c:1758 Bad response to Append Data command.
Wanted 3000 OK data , got 3903 Error append data.
Finds that nothing is loaded in drive 1. It then
wishes to load slot 5 into drive 1. Of course slot 15
is tape number 000015L3, which is currently loaded in
drive 2! This will cause a continuous problem until
tape number 000015L3 is put back into slot 5!!!!!
Device "Drive-0" (/dev/nsa0) is not open.
Drive 0 status unknown.
Device "Drive-1" (/dev/nsa1) is not open.
Drive 1 status unknown.
Device "Drive-2" (/dev/nsa2) is mounted with:
Volume:
Pool: *unknown*
Media type: LTO3
Slot 259 is loaded in drive 2.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
====
In Use Volume status:
====
Device "Drive-0" (/dev/nsa0) is not open.
Drive 0 status unknown.
Device "Drive-1" (/dev/nsa1) is not open.
Drive 1 status unknown.
Device "Drive-2" (/dev/nsa2) is mounted with:
Volume:
Pool: *unknown*
Media type: LTO3
Slot 259 is loaded in drive 2.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
====
In Use Volume status:
====
*********************************************************************
SCENARIO 3
Device status:
Autochanger "Autochanger" with devices:
"Drive-0" (/dev/nsa0)
"Drive-1" (/dev/nsa1)
"Drive-2" (/dev/nsa2)
Device "FileStorage" (/database/backup) is not open.
Device "Drive-0" (/dev/nsa0) is mounted with:
Volume: 000004L3
Pool: *unknown*
Media type: LTO3
Slot 257 is loaded in drive 0.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
Device "Drive-1" (/dev/nsa1) is not open.
Drive 1 status unknown.
Device "Drive-2" (/dev/nsa2) is mounted with:
Volume: 000015L3
Pool: *unknown*
Media type: LTO3
Slot 259 is loaded in drive 2.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
====
In Use Volume status:
000004L3 on device "Drive-0" (/dev/nsa0)
Reader=0 writers=0 reserved=0
000015L3 on device "Drive-2" (/dev/nsa2)
Reader=0 writers=0 reserved=0
====
20-Aug 10:53 mailbackup-dir: Start Backup JobId 121,
Job=maildrop0.2007-08-20_10.53.55
20-Aug 10:53 mailbackup-dir: Using Device "Drive-0"
20-Aug 10:53 mailbackup-sd: 3307 Issuing autochanger "unload slot 257, drive 0"
command.
20-Aug 10:54 mailbackup-sd: 3304 Issuing autochanger "load slot 5, drive 0"
command.
20-Aug 10:55 mailbackup-sd: maildrop0.2007-08-20_10.53.55 Fatal error: 3992 Bad
autochanger "load slot 5, drive 0": ERR=Child
exited with code 1.
Results=chio: /dev/ch0: CHIOMOVE: Input/output error
20-Aug 10:55 maildrop0: maildrop0.2007-08-20_10.53.55 Fatal error: job.c:1758
Bad response to Append Data command. Wanted 3000
OK data
, got 3903 Error append data
====
Device status:
Autochanger "Autochanger" with devices:
"Drive-0" (/dev/nsa0)
"Drive-1" (/dev/nsa1)
"Drive-2" (/dev/nsa2)
Device "FileStorage" (/database/backup) is not open.
Device "Drive-0" (/dev/nsa0) is not open.
Drive 0 status unknown.
Device "Drive-1" (/dev/nsa1) is not open.
Drive 1 status unknown.
Device "Drive-2" (/dev/nsa2) is mounted with:
Volume:
Pool: *unknown*
Media type: LTO3
Slot 259 is loaded in drive 2.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
====
In Use Volume status:
====
====
Bacula starts it's first job after a Bacula restart and having tapes still in
some drives.
He knows that tape number 000004L3 is loaded in drive 0 and tape number
000015L3 is
loaded in drive 2.
Device "Drive-0" (/dev/nsa0) is mounted with:
Volume: 000004L3
Pool: *unknown*
Media type: LTO3
Slot 257 is loaded in drive 0.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
Device "Drive-1" (/dev/nsa1) is not open.
Drive 1 status unknown.
Device "Drive-2" (/dev/nsa2) is mounted with:
Volume: 000015L3
Pool: *unknown*
Media type: LTO3
Slot 259 is loaded in drive 2.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
====
In Use Volume status:
000004L3 on device "Drive-0" (/dev/nsa0)
Reader=0 writers=0 reserved=0
000015L3 on device "Drive-2" (/dev/nsa2)
Reader=0 writers=0 reserved=0
====
Bacula first says he wants to use drive 0. Because he KNOWS the status, he
immedeately issues an unload of
drive 0. Bacula wishes to use tape number 000015L3 in drive 0 but it is already
loaded on drive 2. Start a new job and
error: job.c:1758. Bad response to Append Data command. Wanted 3000 OK data ,
got 3903 Error append data
Start another job and
error: job.c:1758. Bad response to Append Data command. Wanted 3000 OK data ,
got 3903 Error append data
Output from the second "status 2 2" command shows that Bacula nolonger knows
what tape is in drive 2.
This problem will continue to happen until I unload tape number 000015L3.
Keeping in mind that at this point in the discussion, Bacula has a "hold" on
drive 0 trying to load tape 000015L3, as discussed above. We will compund the
problem
by starting a few more jobs....
Bacula will get one job to use drive 1 with a new tape 000011L3. This job goes
on it's merry way.
The next job starts....
Bacula decides to use drive 0, (bacula is finished with the "hold"
he has on drive 0) and the next available tape 000011L3 thinking
tape 000011L3 is still in a slot when it is really in drive 1.
error: job.c:1758. Bad response to Append Data command. Wanted 3000 OK data ,
got 3903 Error append data
This problem will continue to happen until I can free up tape 000011L3 for
drive 0.
Do a chio status -a
drive 0: <ACCESS> sense: <0x00/0x00> voltag: <:0> avoltag: <:0> source: <>
intaddr: <256> scsi: <3:?>
drive 1: <FULL> sense: <0x00/0x00> voltag: <000011L3:0> avoltag: <:0> source:
<slot 5> intaddr: <257> scsi: <1:?>
drive 2: <FULL> sense: <0x00/0x00> voltag: <000015L3:0> avoltag: <:0> source:
<slot 4> intaddr: <258> scsi: <2:?>
Tape number 000011L3 in drive 1
Tape number 000015L3 in drive 2
Do another status 2 2 and WHAT????
Bacula now thinks that the Device status is:
Device "Drive-0" (/dev/nsa0) is not open.
Drive 0 status unknown.
Device "Drive-1" (/dev/nsa1) is mounted with:
Volume:
Pool: Tape1
Media type: LTO3
Slot 258 is loaded in drive 1.
Total Bytes=1,864,654,848 Blocks=28,903 Bytes/block=64,514
Positioned at File=2 Block=0
Device "Drive-2" (/dev/nsa2) is mounted with:
Volume:
Pool: *unknown*
Media type: LTO3
Slot 259 is loaded in drive 2.
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
Bacula now thinks the In Use Volume status is:
000011L3 on device "Drive-0" (/dev/nsa0)
Reader=0 writers=0 reserved=0
It would appear that Bacula is doing 2 things at job start.
1. Decide what drive he wishes to use.
2. Decide what tape he wishes to use.
The problem is that Bacula is not thinking about what is in use and where it is.
Note the last status 2 2 after the job fails with a fatal error.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel