Re: [Bacula-devel] Problems with concurrent backups to a single disk based pool

Jonathan Hankins Tue, 17 Dec 2013 20:32:45 -0800

Found this in the manual at:
http://www.bacula.org/5.2.x-manuals/en/main/main/Configuring_Director.html#SECTION0014150000000000000000

*Maximum Volume Jobs = positive-integer*This directive specifies the
maximum number of Jobs that can be written to the Volume. If you specify
zero (the default), there is no limit. Otherwise, when the number of Jobs
backed up to the Volume equals *positive-integer* the Volume will be marked
*Used*. When the Volume is marked *Used* it can no longer be used for
appending Jobs, much like the *Full* status but it can be recycled if
recycling is enabled, and thus used again. By setting*MaximumVolumeJobs* to
one, you get the same effect as setting *UseVolumeOnce = yes*.

The value defined by this directive in the bacula-dir.conf file is the
default value used when a Volume is created. Once the volume is created,
changing the value in the bacula-dir.conf file will not change what is
stored for the Volume. To change the value for an existing Volume you must
use the *update* command in the Console.

If you are running multiple simultaneous jobs, this directive may not work
correctly because when a drive is reserved for a job, this directive is not
taken into account, so multiple jobs may try to start writing to the
Volume. At some point, when the Media record is updated, multiple
simultaneous jobs may fail since the Volume can no longer be written.

Look at the third paragraph.  I imagine this may be what you're hitting.
 Basically, limiting the number of jobs per volume fights with the
multiplexing the SD does to handle multiple concurrent jobs to the same
pool.  I think this would potentially be a problem regardless of the max
volume jobs, but I think you'd get bitten by it more often the lower the
setting.

I think your two options are job-per-pool (what I do) or multiple
concurrent jobs to one pool, and let the SD be free to multiplex jobs onto
volumes however it sees fit.

I'm curious -- are you really wanting exactly one job per volume (and then
I'd be curious, why?),or are you rather trying to limit the size of the
files backing the volumes to possibly make restoring less time consuming (I
don't know that it does), or overcome filesystem limitations?  In the
latter cases, I think you'd be able to set Max Volume Files and/or Max
Volume Bytes and NOT stomp on the multiplexing the SD does.

I think the race condition comes from the delay in the media record update
(the docs say it happens "at some point").  This could have a lot of
variation in behavior depending on the specific timings of the kind of jobs
you were doing.

Also, earlier you said "To this end I have multiple devices "pointing" at the
same disk based pool.
"  I am confused (and of course I haven't looked at your config -- sorry
about that) because pools reference Storage resources, which in turn
reference Device resources, which are defined in the SD's configuration.
 Do you rather mean that you have multiple Device resources in the SD
config that are using the same "Archive Device" and "Media Type"?

The manual also says (see the warning near the end):

*Device = device-name*This directive specifies the Storage daemon's name of
the device resource to be used for the storage. If you are using an
Autochanger, the name specified here should be the name of the Storage
daemon's Autochanger resource rather than the name of an individual device.
This name is not the physical device name, but the logical device name as
defined on the *Name* directive contained in the *Device* or the
*Autochanger* resource definition of the *Storage daemon*configuration
file. You can specify any name you would like (even the device name if you
prefer) up to a maximum of 127 characters in length. The physical device
name associated with this device is specified in the *Storage
daemon*configuration
file (as *Archive Device*). Please take care not to define two different
Storage resource directives in the Director that point to the same Device
in the Storage daemon. Doing so may cause the Storage daemon to block (or
hang) attempting to open the same device that is already open. This
directive is required.

Maybe you're running into that.

Hope that helps.  I'd look at your configs but I am heading out on vacation
and haven't finished packing :)

-Jonathan Hankins

On Tue, Dec 17, 2013 at 9:00 PM, Mike Brady <mike.br...@devnull.net.nz>wrote:

> Hi
>
> If this isn't supposed to work then that would mean that there can
> only ever be one volume per pool mounted for writing at any one time.
> Is that a known Bacula limitation?
>
> I believe that the jobs are "working".  I have restored a number of
> the volumes and they have always contained what I thought they should,
> but as you say I may just have been lucky.
>
> I have run the directory in debug mode on my test system today and it
> looks like there is a race of some sort because multiple jobs are
> picking up the same volume initially, but only one job ever uses that
> volume.  The other jobs always seem to move on to another volume. At
> least I think that is what the log is showing.  I have attached the
> debug log.
>
> Bug or by design something isn't right, so I guess it is back to the
> drawing board :-(
>
> Thanks
>
> Mike
>
> Quoting Jonathan Hankins <jhank...@homewood.k12.al.us>:
>
> > Heya,
> >
> >
> > On Tue, Dec 17, 2013 at 2:42 PM, Mike Brady <mike.br...@devnull.net.nz
> >wrote:
> >>
> >> I am doing one job per volume which means that I need to have multiple
> >> volumes from the same pool mounted at the same time in order to do
> >> concurrent jobs.
> >
> >
> > I don't think this is supposed to work.  I wanted to do the something
> > similar, and I wound up doing one pool/storage/media type/etc. per job,
> and
> > wrote a script to generate my configs from templates.  I made it a bit
> too
> > complex, as most of my jobs turned out looking the same, and I didn't
> need
> > as much flexibility as I designed in, but you can knock something
> together
> > pretty quickly.
> >
> >
> >
> >> To this end I have multiple devices "pointing" at
> >> the same disk based pool.  This works except for the intermittent
> >> problem with allocating a volume as indicated in my original post.
> >
> >
> > I think that the concurrent writers support happens in the SD, at the
> > pool level, by interleaving writes from concurrent jobs to the volume
> > mounted for that pool.  If I had to guess, I'd say it either appears to
> be
> > working, but isn't really writing out jobs correctly, or you're just
> > getting really lucky.
> >
> > -Jonathan
> >
> > --
> > ------------------------------------------------------------------------
> > Jonathan Hankins    Homewood City Schools
> >
> > The simplest thought, like the concept of the number one,
> > has an elaborate logical underpinning. - Carl Sagan
> >
> > jhank...@homewood.k12.al.us
> > ------------------------------------------------------------------------
> >
>
>

-- 
------------------------------------------------------------------------
Jonathan Hankins    Homewood City Schools

The simplest thought, like the concept of the number one,
has an elaborate logical underpinning. - Carl Sagan

jhank...@homewood.k12.al.us
------------------------------------------------------------------------

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk

_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Problems with concurrent backups to a single disk based pool

Reply via email to