Hello,

see my answer below ..
On Wed, 2014-04-09 at 21:08 +0200, Kern Sibbald wrote:
> Hello,
> 
> See my comments below ...
> 
> On 04/09/2014 01:46 PM, Ulrich Leodolter wrote:
> > Hello,
> >
> > i am testing one the new bacula 7.x features:
> >
> > *Migration/Copy/VirtualFull Performance Enhancements*
> >
> > The Bacula Storage daemon now permits multiple jobs to simultaneously read 
> > the same disk Volume,
> > which gives substantial performance enhancements when running Migration, 
> > Copy, or VirtualFull jobs
> > that read disk Volumes. Our testing shows that when running multiple 
> > simultaneous jobs,
> > the jobs can finish up to ten times faster with this version of Bacula.
> > This is built-in to the Storage daemon, so it happens automatically and 
> > transparently .
> >
> >
> > i our setup we have 2 CopyDiskToTape which go into different pools on Tape 
> > storage.
> > our storage is a 2-drive autochanger device.
> >
> > before the copy jobs are started, each drive has mounted a volume of the 
> > destination pools.
> >
> > the problem is that both copy jobs only look at drive index 0
> > and premounted volumes are always swapped (mounted/unmounted) at drive 0.
> >
> >
> > we have a second bacula installation running 5.2.13
> > which has more or less the same setup and hardware.
> > on this installation parallel copy jobs runs can run without
> > swapping volumes on autochanger drive 0.
> > to overcome the exclusive read-lock limitation in this bacula version
> > we have defined two file storage devices which point to the
> > same location.  our sql selects the copy jobs in opposite order
> > for the two jobs,  so we can minimize the number of conflicts
> > when one file volume is already locked.
> 
> Unfortunately, I don't understand well enough what the real problem is. 
> It sounds like you are saying there is a problem on one bacula
> installation and not on the other, which would imply that there is
> something in the conf that is triggering the problem.
> 
> I am also not sure what the problem is of swapping volumes.  With the
> current algorithm (rather primitive) when no jobs are running Bacula
> will always look a the drives in the order they are in the conf file or
> perhaps it is in alphabetic order, so it will always look at a
> particular drive first. If that drive is not being used, the job will be
> assigned that drive.
> 
> A better behavior might be to search for an empty drive and always start
> with that one, but that will not be an ideal solution as at some point
> all the drives will have a volume in them so some volume needs to be
> swapped.
> 
> I have been meaning to work on improving the tape usage algorithm, in
> particular putting in a better round robin scheme than currently exists,
> but unfortunately there always seem to be more urgent tasks, and the
> list of things to do is getting larger rather than smaller, so I am
> probably not being very optimistic here.
> 
> If there is a definitive bug her rather than an inefficiency, and it can
> be clearly described I might be able to fix it.
> 
> >
> >
> > my question:
> >
> > has there something changed in bacula 7.x how bacula determines
> > if a volume is already mounted for an autochanger device ?
> No, nothing has changed.  If a volume is premounted and a job wants to
> use it, Bacula should notice that and select that drive, because part of
> the current algorithm is to look at all pre-mounted volumes to see if
> one can be used. If that is not the case, and you are talking about a
> single job (no other jobs contending for the same resources), I would
> like to see a detailed analysis what is going wrong, because I could
> probably fix it.
> >
> > why does bacula not use a premounted volume at drive index 1 ?
> Good question.  I would need to know the exact conditions before I could
> answer, but most likely Bacula sees that drive index 0 is available and
> takes it, and then asks for a tape and gets a different one.  If it
> swaps the Volume from drive 1 onto drive 0 and drive 1 is not being
> used, then the current algorithm is not working correctly, but to fix it
> I would need a reproducible case.
> 
> Best regards,
> Kern

It seems there is a bug and bacula 7.0.x behaves different than
version 5.x.x.

Yesterday have disabled one of my copy jobs, the one with should
use drive 1.   The other copy job was scheduled today at 6:05,
it completed normal.   Below you can see storage status after the first
copy job finished.  There is one important thing in the status below,
volume BACX.101 from Pool ExtClone is still mounted in QTM-Drive-1,
mounted it yesterday evening manually.

Today i started the second copy job which has ExtClone as destination
pool.  Because BACX.101 from ExtClone pool is already mounted i expect
Bacula to use it,  but see below what happend (JobId 615860) BACX.101
was swapped into QTM-Drive-0.

I  am 100% sure this worked without swapping in version 5.2.13.
This test also verified it has nothing to do with parallel copy jobs.  
After upgrading to 7.0.2 i did not change any config settings and it
does not include PreferMountedVolumes, so i expect the default
value PreferMountedVolumes=Yes

Regarding the algorithm i will answer separate,  i will try to
explain what i would call a "natural" or clever algorithm.
But i fully understand this is not a trivial task, because it
may break existing installations.

Do you need more information?
When i have time i will try find why Bacula 7.0.x behaves different.

Best regards
Ulrich


*status storage=QTM-Tape
Connecting to Storage daemon QTM-Tape at troll.obvsg.at:9103

troll-sd Version: 7.0.2 (02 April 2014) x86_64-unknown-linux-gnu redhat 
Daemon started 03-Apr-14 14:36. Jobs: run=721, running=0.
 Heap: heap=270,336 smbytes=15,158,906 max_bytes=17,978,512 bufs=383 
max_bufs=5,779
 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0

Running Jobs:
No Jobs running.
====

Jobs waiting to reserve a drive:
====

Terminated Jobs:
 JobId  Level    Files      Bytes   Status   Finished        Name 
===================================================================
615847  Full      2,658    57.70 M  OK       10-Apr-14 06:14 CopyDiskToTape
615848  Incr      2,658    57.70 M  OK       10-Apr-14 06:14 Backup-troll
615849  Full      1,835    49.44 M  OK       10-Apr-14 06:15 CopyDiskToTape
615850  Incr      1,835    49.44 M  OK       10-Apr-14 06:15 Backup-idefix
615851  Full      3,424    34.59 M  OK       10-Apr-14 06:15 CopyDiskToTape
615852  Incr      3,424    34.59 M  OK       10-Apr-14 06:15 Backup-apollo
615853  Full      2,128    47.67 M  OK       10-Apr-14 06:15 CopyDiskToTape
615854  Incr      2,128    47.67 M  OK       10-Apr-14 06:15 Backup-paladin
615812  Full      3,636    136.4 M  OK       10-Apr-14 06:15 CopyDiskToTape
615855  Incr      3,636    136.4 M  OK       10-Apr-14 06:15 Backup-teamwork
====

Device status:
Autochanger "OVERLAND" with devices:
   Drive-1
   Drive-2
Autochanger "QTM-Scalar" with devices:
   "QTM-Drive-0" (/dev/qtm-nst0)
   "QTM-Drive-1" (/dev/qtm-nst1)

Device "FileStorage" (/disk0/bacula/files) is not open.
==

Device "Drive-1" is not open or does not exist.
==

Device "Drive-2" is not open or does not exist.
==

Device "QTM-Drive-0" (/dev/qtm-nst0) is mounted with:
    Volume:      BACU.113
    Pool:        DiskCopy
    Media type:  LTO-6
    Slot 14 is loaded in drive 0.
    Total Bytes=2,119,786,651,648 Blocks=8,086,384 Bytes/block=262,142
    Positioned at File=264 Block=0
==

Device "QTM-Drive-1" (/dev/qtm-nst1) is mounted with:
    Volume:      BACX.101
    Pool:        ExtClone
    Media type:  LTO-6
    Slot 1 is loaded in drive 1.
    Total Bytes Read=64,512 Blocks Read=1 Bytes/block=64,512
    Positioned at File=0 Block=0
==
====

Used Volume status:
Reserved volume: BACU.113 on tape device "QTM-Drive-0" (/dev/qtm-nst0)
    Reader=0 writers=0 reserves=0 volinuse=0
Reserved volume: BACX.101 on tape device "QTM-Drive-1" (/dev/qtm-nst1)
    Reader=0 writers=0 reserves=0 volinuse=0
====

Data spooling: 0 active jobs, 0 bytes; 142 total jobs, 112,034,563,261 max 
bytes/job.
Attr spooling: 0 active jobs, 8,716,940,756 bytes; 151 total jobs, 
8,716,940,756 max bytes.
====



First log messages of first copy job:

2014-04-10 09:12:37 troll-sd JobId 615860: 3307 Issuing autochanger "unload 
slot 14, drive 0" command.
2014-04-10 09:14:33 troll-dir JobId 615860: Using Device "QTM-Drive-0" to write.
2014-04-10 09:14:33 troll-sd JobId 615860: 3307 Issuing autochanger "unload 
slot 1, drive 1" command.
2014-04-10 09:15:26 troll-sd JobId 615860: 3304 Issuing autochanger "load slot 
1, drive 0" command.
2014-04-10 09:16:07 troll-sd JobId 615860: 3305 Autochanger "load slot 1, drive 
0", status is OK.
2014-04-10 09:16:18 troll-sd JobId 615860: Volume "BACX.101" previously 
written, moving to end of data.
2014-04-10 09:17:22 troll-sd JobId 615860: Ready to append to end of Volume 
"BACX.101" at file=239.
2014-04-10 09:17:27 troll-sd JobId 615860: Elapsed time=00:00:05, Transfer 
rate=45.33 M Bytes/second





------------------------------------------------------------------------------
Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment 
Start a new project now. Try Jenkins in the cloud.
http://p.sf.net/sfu/13600_Cloudbees
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to