On 10 Sep 2007 at 11:05, DAve wrote: > Kern Sibbald wrote: > > This document contains the technical details of Bug #395. > > > > Bacula bug #935 reports that during a restore, a large number of files are > > missing and thus not restored. This is really quite surprising because we > > have a fairly extensive regression test suite that explicitly tests for > > this > > kind of problem many times. > > > > Despite our testing, there is indeed a bug in Bacula that has the following > > characteristics: > > > > 1. It happens only when multiple simultaneous Jobs are run (regardless of > > whether or not data spooling is enabled), and happens only when the > > Storage daemon is changing from one Volume to another -- i.e. the > > backups span multiple volumes. > > > > 2. It has only been observed on disk based backup, but not on tape. > > > > 3. Under the right circumstances (timing), it could and probably does > > happen > > on tape backups. > > > > 4. It seems to be timing dependent, and requires multiple clients to > > reproduce, although under the right circumstances, it should be reproducible > > with a single client doing multiple simultaneous backups. > > > > 5. Analysis indicates that it happens most often when the clients are slow > > (e.g. doing Incremental backups). > > > > 6. It has been verified to exist in versions 2.0.x and 2.2.x. > > > > 7. It should also be in version 1.38, but could not be reproduced in > > testing, > > perhaps due to timing considerations or the fact that the test FD daemons > > were version 2.2.2. > > > > 8. The data is correctly stored on the Volume, but incorrect index > > (JobMedia) > > records are stored in the database. (the JobMedia record generated during > > the Volume change contains the index of the new Volume rather than the > > previous Volume). This will be described in more detail below. > > > > 9. You can prevent the problem from occurring by either turning off > > multiple > > simultaneous Jobs or by ensuring that while running multiple simultaneous > > Jobs that those Jobs do not span Volumes. E.g. you could manually mark > > Volumes as full when they are sufficiently large. > > > > 10. If you are not running multiple simultaneous Jobs, you will not be > > affected by this bug. > > > > 11. If you are running multiple simultaneous Jobs to tapes, I believe there > > is > > a reasonable probability that this problem could show up when Jobs are > > split > > across tapes. > > > > 12. If you are running multiple simultaneous Jobs to disks, I believe there > > is > > a high probability that this problem will show up when Jobs are split > > across > > disks Volumes. > > > > =============================== > > > > The problem comes from the fact that when the end of a Volume is reached, > > the SD must generate a JobMedia (index) record for each of the Jobs that is > > currently running. Since each job is in a separate thread, the thread that > > does the Volume switch marks all the other threads (Jobs) with a flag > > that tell them to update the catalog index (JobMedia). Sometime later, > > when that thread attempts to do another write to the volume, it will > > create a JobMedia record. > > > > If I read everything correctly, I believe we would be immune to this bug > at this time. While we certainly use concurrent jobs, each job is > written to a recycled volume each night. We have no jobs that span a > volume at any time. > > Would that be a correct analysis?
I think so. -- Dan Langille - http://www.langille.org/ Available for hire: http://www.freebsddiary.org/dan_langille.php ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users