Re: [Bacula-users] bacula mixing up slots and volumes with virtual multiple drive autochanger running on Amazon AWS storage gateway

heitor Thu, 13 Nov 2014 12:28:46 -0800

My settings for each job has prefer mounted volumes = no so each job should 
kick off until there are 10 jobs running and all 10 drives are in use. But this 
is sometimes not the case.


Mr. Jared: I think you should never use prefer mounted volumes = no . It's not 
a perfect Bacula functionality, eventually will lead you to glitches and this 
warning is in the Bacula web manual. 
If you want to use multiple volume writing, go for submitting jobs at the same 
time to different pools. 

Regards, 
========================================================================= 
Heitor Medrado de Faria 
Faltam poucos dias - Treinamento Telepresencial Bacula: 
http://www.bacula.com.br/?p=2174 
61 2021-8260 | 8268-4220 
Site: www.bacula.com.br | Facebook: heitor.faria | Gtalk: heitorfa...@gmail.com 
========================================================================== 

----- Mensagem original -----

De: "Jared Kelley" <jkel...@popcap.com> 
Para: bacula-users@lists.sourceforge.net 
Enviadas: Quinta-feira, 13 de novembro de 2014 17:39:31 
Assunto: [Bacula-users] bacula mixing up slots and volumes with virtual 
multiple drive autochanger running on Amazon AWS storage gateway 

Curious if anyone has any knowledge of this or if maybe I’ve come upon a bug 
with bacula. 
bacula Version: 5.2.6 
OS info 


cat /etc/debian_version 

jessie/sid 




lsb_release -a 

Distributor ID: Ubuntu 

Description: Ubuntu 14.04.1 LTS 

Release: 14.04 

Codename: trusty 




I’m using amazon AWS storage gateway for my auto changer and tape drives. 

This allows for 1600 slots, 10 drives and 1 auto changer. 




I am able to run up to 10 jobs at one time, writing to 10 different virtual 
tapes on 10 different drives, concurrently. 

The problem is, after many jobs have ran successfully over a period of a few 
days I start getting errors such as: 







12-Nov 20:05 backup02-sd JobId 774: 3304 Issuing autochanger "load slot 1, 
drive 1" command. 
12-Nov 20:10 backup02-sd JobId 774: Fatal error: 3992 Bad autochanger "load 
slot 1, drive 1": ERR=Child exited with code 1. 


Results=Loading media from Storage Element 1 into drive 1...Source Element 
Address 20000 is Empty 

12-Nov 20:10 svc3-02-fd JobId 774: Fatal error: job.c:1817 Bad response to 
Append Data command. Wanted 3000 OK data 



got 3903 Error append data 

OR 



12-Nov 19:10 backup02-sd JobId 772: Please mount Volume "AAAACF5F6A" or label a 
new one for: 

Job: nfs:tape.2014-11-12_18.43.30_05 

Storage: "Drive-2" 
(/dev/tape/by-path/ip-10.<IP>-iscsi-iqn.1997-05.com.amazon:sgw-20b85d49-tapedrive-02-lun-0-nst)
 

Pool: Default 



Media type: LTO 



12-Nov 19:18 backup02-sd JobId 772: Warning: Volume "AAAACF5F6A" wanted on 
"Drive-2" 
(/dev/tape/by-path/ip-10.<IP>:3260-iscsi-iqn.1997-05.com.amazon:sgw-20b85d49-tapedrive-02-
 lun-0-nst) is in use by device "Drive-1" 
(/dev/tape/by-path/ip-10.<IP>-iscsi-iqn.1997-05.com.amazon:sgw-20b85d49-tapedrive-01-lun-0-nst)
 

12-Nov 19:18 backup02-sd JobId 772: Warning: mount.c:217 Open device "Drive-2" 
(/dev/tape/by-path/ip-10.5.66.22:3260-iscsi-iqn.1997-05.com.amazon:sgw-20b85d49-tapedrive-02-lun-0-
 nst) Volume "AAAACF5F6A" failed: ERR=dev.c:506 Unable to open device "Drive-2" 
(/dev/tape/by-path/ip-10.<IP>-iscsi-iqn.1997-05.com.amazon:sgw-20b85d49- 
tapedrive-02-lun-0-nst): ERR=No medium found 







The above 2 jobs failed with an Error, but this job, with similar warnings in 
the log file, ended up completing successfully with no intervention from 
myself. 







Nov 08:35 backup02-sd JobId 785: Warning: mount.c:217 Open device "Drive-6" 
(/dev/tape/by-path/ip-10.<IP>:3260-iscsi-iqn.1997-05.com.amazon:sgw-20b85d49-tapedrive-06-lun-0-nst)
 Volume "AAAAC95F6C" failed: ERR=dev.c:506 Unable to open device "Drive-6" 
(/dev/tape/by-path/ip-10.<IP>:3260-iscsi-iqn.1997-05.com.amazon:sgw-20b85d49-tapedrive-06-lun-0-nst):
 ERR=No medium found 




13-Nov 08:35 backup02-sd JobId 785: Please mount Volume "AAAAC95F6C" or label a 
new one for: 

Job: rvsdb1-02:daily.2014-11-13_08.05.00_26 

Storage: "Drive-6" 
(/dev/tape/by-path/ip-10.5.66.22:3260-iscsi-iqn.1997-05.com.amazon:sgw-20b85d49-tapedrive-06-lun-0-nst)
 

Pool: Default 

Media type: LTO 

13-Nov 08:40 backup02-sd JobId 785: 3307 Issuing autochanger "unload slot 5, 
drive 2" command. 

13-Nov 08:40 backup02-sd JobId 785: 3304 Issuing autochanger "load slot 5, 
drive 5" command. 

13-Nov 08:40 backup02-sd JobId 785: 3305 Autochanger "load slot 5, drive 5", 
status is OK. 

13-Nov 08:40 backup02-sd JobId 785: Volume "AAAAD55F70" previously written, 
moving to end of data. 



13-Nov 08:40 backup02-sd JobId 785: Ready to append to end of Volume 
"AAAAD55F70" at file=985. 




It appears this last job, ID 785, was able to load slot 5 in a different drive 
and continue on writing to a different volume, AAAAD55F70, than it was 
originally wanting to write to, AAAAC95F6C . 

My settings for each job has prefer mounted volumes = no so each job should 
kick off until there are 10 jobs running and all 10 drives are in use. But this 
is sometimes not the case. 

I have seen jobs fail as is the case for JobId 772 and 774 as well as some 
complete such as Job 785. 

I run a list volumes and there are times where 2 separate volumes are shown 
using the same slot. 

I then run update slots scan and one duplicate slot becomes a 0. There isn’t a 
0 slot #, so I run update slots again and the correct slots and volumes are now 
listed in console as compared to the output of: 

/etc/bacula/scripts/mtx-changer /dev/sg12 listall 




It seems bacula is mixing up the slots and which volumes are in each slot all 
by itself and my only conclusion is over time, bacula is unable to successfully 
manage 1600 slots with 10 drives and concurrent jobs using the first 10 of 
those 1600 slots with the 10 drives. Of course all tapes are virtual tapes 
being written to AmazonAWS and the virtual changer is a storage gateway running 
on vmaware with iscsi targets used by bacula running on a bare-metal server 
successfully, up to a point. 




Any help or input on this issue would be greatly appreciated. If I can provide 
any further information that would be helpful please let me know that as well. 




Thanks in advance 




Jk 



------------------------------------------------------------------------------ 
Comprehensive Server Monitoring with Site24x7. 
Monitor 10 servers for $9/Month. 
Get alerted through email, SMS, voice calls or mobile push notifications. 
Take corrective actions from your mobile device. 
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk 
_______________________________________________ 
Bacula-users mailing list 
Bacula-users@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/bacula-users

------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://pubads.g.doubleclick.net/gampad/clk?id=154624111&iu=/4140/ostg.clktrk

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] bacula mixing up slots and volumes with virtual multiple drive autochanger running on Amazon AWS storage gateway

Reply via email to