> > Hmm. Does the second job transfer the data from the FD 
> again? If so, 
> > then that doesn't (IMHO) quite do what I want to do here. I really 
> > want to transfer the data only once (the only guarantee we have of 
> > getting the same data on all the copies) and create the 
> replicas on the server side.
> 
> Yes, it starts a second job.  The disadvantage of this is 
> that the data is not 100% identical if anything is changing 
> on the FD.  The advantage is that it avoids a whole bunch of 
> complications that I have not logically resolved concerning 
> having two backups of the same thing in the same job.

Hmm. I don't think that would pass our auditors. If there's a significant
chance that the copies are not identical (and it sounds like this approach
pretty much guarantees that the copies will not be identical), I don't think
it would be sufficient or useful for this purpose. It does however make
implementation easier, as you said. 

> > (As a side issue, I'm beginning to wonder if overall we need a more 
> > generalized job manager. This is sort of sounding like we need 
> > something like JCL, and then this could all be handled in a 
> more systematic way.
> > That's a much bigger project, though.)
> 
> Perhaps if I were starting to design Bacula with the 
> knowledge I have today, I would have a different structure.  
> However, I have to live with the current code, and at the 
> current time, I am, unfortunately, the only one who 
> understands it and who is continuously working on the 
> project.

Don't you feel lucky and needed? 8-)

>  Making any major design changes is not something I 
> can handle without a team of programmers.  By myself, I can 
> continue the same path I have taken over the years -- slowly 
> evolve it to provide all the functionality we want.

As I said, it's a MUCH bigger project. Not on the radar for today or
tomorrow, just musing a bit on something I was thinking about. What we've
got works; it's more thinking about where future simplifications might go. 

> This could be a way to do it, but it doesn't fit in with the 
> current Bacula scheme.  Any restore can have Volumes from 
> multiple pools (typically not from a single job).  Many users 
> separate their Volumes into Full, Diff, Inc pools.
> 
> So, IMO, unless I am missing something you are saying, a Pool 
> is not a good way to separate multiple copies.  I do have a 
> database column designed to indicate what copy a particular 
> Volume record is from (I also have a stripe database column). 
>  Since they are not yet fully implemented, they are not yet 
> stored in the DB to conserve space, but this info is passed 
> from the SD to the DIR.

I probably didn't explain it well. I don't think the approach conflicts at
all with current usage -- the primary pools can still be anything the user
designates. Copy pools are horizontal in nature, in *addition* to the
existing primary pool structure -- basically they provide a way of grouping
volumes so that copies of volumes in the primary pool are selected from a
designated set of volumes. So, example: 

Pool A (primary Full) --> Pool B (copypool 1 for pool A) --> Pool C
(copypool 2 for Pool A) --> etc. 
Pool D (primary Diff) --> Pool E (copypool 1 for pool D) --> Pool F
(copypool 2 for Pool D) --> etc. 

Let's say that Pool A is a pool containing volumes A1, A2, and A3.  Pool B
is a different pool, containing volumes B1, B2, and B3. Pool C contains
volumes C1, C2 and C3, and so forth. 

In a backup job, data is written to a volume selected from the primary pool,
say A2. If copypools are defined for the primary pool, the same data is
written to volumes selected from the designated copypool(s), say B1 and C3.
The idea of a SD-mux would allow this to be implemented w/o changing a lot
of the SD code -- jobs talk to the SD-mux, and the SD-mux would look at the
definition of the primary pool, and then establish N sessions with the SDs
manageing the primary and copypools, 1 per pool. The SD-mux accepts the
write from the FD, and returns a "write complete" when all the end SDs
acknowledge the write complete. The end SDs use the same volume selection
method they do now, selecting a volume from the appropriate pool using the
same logic used today. If multiple jobs are active, that's fine -- the
SD-mux doesn't care, and the end SD will not try to select the same volume
in use for another job, eg. Job 2 will get either A1 or A3 from Pool A,
since A2 is already in use for another job and the SD for Pool A already
knows that. Same logic occurs for the copypools.

The above should handle the consistency issue for the volumes neatly. As you
say, the problem is associating multiple volume residence records with a
file record in the database. What I was trying to suggest was that the
volume residence field (probably not the right name, but I'm talking about
the entry in the file record that indicates what volume the file is on)
could become a list of (1...n) volume records instead. Same data, just that
there can be more than 1 volume record associated with a file, defaulting to
the current 1 volume per file. In our above example, the database would
reflect one file record and three volume records for A2, B1, and C3 -- all
with the same data. 

On a restore, you could examine the file record in the database, which would
tell you what volumes have copies of this file based on the list above. You
can then sort the list of files to be restored by volume (minimizing
mounts), check to see if any of the volumes in your list are already
mounted, and proceed from there, removing files from the restore list as you
successfully restore them. If you're unable to restore from one volume in
the list for a file, then try the next volume in the list from the next
copypool in the list for the primary pool. (Note that the parallelism cited
above would also apply to multiple restore jobs, allowing multiple jobs to
restore the same file at the same time (subject to # of copypools and
hardware availability -- mounted volumes in use in another job are already
considered busy/unavailable by the volume selection algorithm)).

Using the example above, if the volume A2 containing the file to be restored
is missing or broken, the restore process looks at the copypool definition
for the primary pool, decides that volume B1 in Pool B is the next likely
candidate, retrieves the volume info for B1, and initiates a mount for B1,
repeating the restore process on B1. If that volume is also broken/gone,
then we try volume C3 from Pool C, usw until we hit the end of the copypool
chain. If we still haven't successfully restored the file, then we return an
error. 

Does that explain it a bit better? If not, please tell me where I'm not
making sense. 

(Also as an aside, if we do implement pool migration, then each of these
pools (primary and copypools) can be treated as a separate migration chain
with individual volume and pool migration thresholds, and the program logic
necessary to implement migration is identical for each chain.  Job spooling
also becomes a special case of migration from a disk pool to a tape pool,
and we get multiple copies of the output tapes for free in the process if we
do copypools with identical migration thresholds.) 


> > With the approach above, just taking the volumes in and out of the 
> > changers does the job for you. No new wheels needed.
> 
> Yes, this would work for big shops where everything is in the 
> changer, but for the other 99.9% of us who either don't have 
> changers or who are obligated to remove Volumes from the 
> changers, it would leave the problem of deciding what Volume 
> to take, and how to tell Bacula in a user friendly way that 
> certain Volumes may be offsite.

One conceptual way around that difference is to treat a manual tape drive as
a 1 slot changer with 1 drive, and the changer script would become a small
program that says "insert volume XXX. Type 1 when ready" or something like
that to a designated location (/dev/console or bconsole or something like
that). Then everyone *has* changers, and the conceptual problem is
diminished. 

In my earlier note, along with the above idea, I assumed that removing the
volumes physically would be accompanied by updating the InChanger flag to
indicate that the volume is not physically available. If the InChanger flag
was 0 for a specific volume, the volume is marked unavailable, and the
process I described above would automatically fall back to the next copypool
volume in the chain until we find one that *is* available. 

BTW, this is the function I was talking about in implementing a DR manager
-- determine which volumes were used in a backup sequence, generate a list
to remove, update the appropriate DB fields for those volumes, and
(optionally) eject them from the changer if stored in one. The DR manager
would have to track movement from location to location (possibly express
that as moving from changer to changer, or adding a location field to the
volume record if there isn't one already), but I think that combines two
things: the discipline to *use* such a tool in a consistent manner, and
matching that to a business process, neither of which Bacula can force
someone to do.

-- db




-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to