On 10 Sep 2007 at 11:05, DAve wrote:

> Kern Sibbald wrote:
> >        This document contains the technical details of Bug #395.
> > 
> > Bacula bug #935 reports that during a restore, a large number of files are 
> > missing and thus not restored.  This is really quite surprising because we 
> > have a fairly extensive regression test suite that explicitly tests for 
> > this 
> > kind of problem many times.
> > 
> > Despite our testing, there is indeed a bug in Bacula that has the following 
> > characteristics:
> > 
> > 1. It happens only when multiple simultaneous Jobs are run (regardless of 
> > whether or not data spooling is enabled), and happens only when the 
> > Storage daemon is changing from one Volume to another -- i.e. the
> > backups span multiple volumes.
> > 
> > 2. It has only been observed on disk based backup, but not on tape. 
> > 
> > 3. Under the right circumstances (timing), it could and probably does 
> > happen 
> > on tape backups.
> > 
> > 4. It seems to be timing dependent, and requires multiple clients to 
> > reproduce, although under the right circumstances, it should be reproducible
> > with a single client doing multiple simultaneous backups.
> > 
> > 5. Analysis indicates that it happens most often when the clients are slow 
> > (e.g. doing Incremental backups).
> > 
> > 6. It has been verified to exist in versions 2.0.x and 2.2.x.
> > 
> > 7. It should also be in version 1.38, but could not be reproduced in 
> > testing, 
> > perhaps due to timing considerations or the fact that the test FD daemons 
> > were version 2.2.2.
> > 
> > 8. The data is correctly stored on the Volume, but incorrect index 
> > (JobMedia) 
> > records are stored in the database.  (the JobMedia record generated during 
> > the Volume change contains the index of the new Volume rather than the 
> > previous Volume).  This will be described in more detail below.
> > 
> > 9. You can prevent the problem from occurring by either turning off 
> > multiple 
> > simultaneous Jobs or by ensuring that while running multiple simultaneous 
> > Jobs that those Jobs do not span Volumes.  E.g. you could manually mark 
> > Volumes as full when they are sufficiently large.
> > 
> > 10. If you are not running multiple simultaneous Jobs, you will not be 
> > affected by this bug.
> > 
> > 11. If you are running multiple simultaneous Jobs to tapes, I believe there 
> > is 
> > a reasonable probability that this problem could show up when Jobs are 
> > split 
> > across tapes.
> > 
> > 12. If you are running multiple simultaneous Jobs to disks, I believe there 
> > is 
> > a high probability that this problem will show up when Jobs are split 
> > across 
> > disks Volumes.
> > 
> > ===============================
> > 
> > The problem comes from the fact that when the end of a Volume is reached,
> > the SD must generate a JobMedia (index) record for each of the Jobs that is
> > currently running. Since each job is in a separate thread, the thread that
> > does the Volume switch marks all the other threads (Jobs) with a flag
> > that tell them to update the catalog index (JobMedia).  Sometime later,
> > when that thread attempts to do another write to the volume, it will
> > create a JobMedia record.  
> > 
> 
> If I read everything correctly, I believe we would be immune to this bug 
> at this time. While we certainly use concurrent jobs, each job is 
> written to a recycled volume each night. We have no jobs that span a 
> volume at any time.
> 
> Would that be a correct analysis?

I think so.

-- 
Dan Langille - http://www.langille.org/
Available for hire: http://www.freebsddiary.org/dan_langille.php



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to