Hello,

Yes, sorry, I pulled up your recent emails and accidently clicked on the wrong 
one to reply.

On Monday 18 January 2010 12:56:57 Graham Keeling wrote:
> Hello,
> You've replied to an (I think) unrelated post about a different set of bugs
> (Re: [Bacula-devel] bconsole restore bug - option 12.).
>
> This is 'Re: Bug #1410 - Broken VirtualFull volumes and bacula-sd
> segfault', so I've adjusted the 'In-Reply-To' header so that it will
> hopefully end up in the right thread.
>
> On Mon, Jan 18, 2010 at 11:53:24AM +0100, Kern Sibbald wrote:
> > Hello Graham,
> >
> > I have spent some time reviewing this bug.  Thanks for the database and
> > conf file, I was able to easily reproduce what you saw.
> >
> > It seems to me that there are three problems here -- unfortunately all
> > possibly quite serious:
> >
> > 1. The Volume is corrupt.
> > 2. When Bacula reads the corrupt data, it smashes its stack
> > 3. The FileIndex records are not sequential.
> >
> > 1. Do you have any idea how the Volume got corrupted?
> >
> > It looks like the bad data is associated with the JobSession records
> > (when a job starts, Bacula writes a label to the tape indicating the
> > beginning of a job).
>
> Unfortunately, I do not know how the volume got corrupted so that the
> programs crash.
> However, I do know that running a VirtualFull always messes up some indexes
> so that tools like bls complain. I thought that the crashing was probably
> related to this somehow.
> Bug #1410 talks about this, and it is easy for me to reproduce:
>
> a) Do a Full backup
> b) Do an Incremental backup
> c) Do a VirtualFull backup
>
> bls now complains about the index on the VirtualFull volume.
> If I do no Incremental, the VirtualFull is fine.
>
> > Have you used a modified SD in producing this Volume?
>
> No.
>
> > If so, then there is a bug in the code.  If not, I would like to learn
> > more about the history of this Volume.
>
> The Job was set to do an Incremental every night, with a VirtualFull once a
> week.
> I have a Pool for Fulls and a Pool for Incrementals.
> 'NextPool' on the Incremental Pool is set to the Full Pool, so all
> VirtualFulls end up in the Full Pool.
>
> It had been running perhaps a couple of months.
> It is only backing up a few megabytes of files, and the mysql database via
> a fifo.
> There is a directory containing lots of hard links. It is possible that
> this is causing some trouble, since bls crashes whilst listing them. The
> structure is like this...
> /var/log/change_tracker/oc/XXXXXX/(lots of small files called the same
> thing) ...where XXXXXX is a timestamp - the lots of small files are hard
> linked across the timestamped directories.
> There is one job per volume. Volumes are recycled after a couple of weeks.
>
> The idea was to never do a Full, and to rely on the VirtualFulls to
> consolidate the Incrementals.
> Perhaps the continual shuffling of the indexes eventually causes the crash.
>
> Once the bacula-sd starts crashing, no more VirtualFulls can happen.

Yes, that makes sense because anything reading through the broken records will 
crash.

>
> > 2. Bacula trashes its stack when it encounters the bad Session records.
> > Unfortunately, the serial code used to write and extract lables is very
> > old and didn't properly protect itself from bad data on the Volume.  I
> > have now modified the current source code to fix this problem.  With the
> > fix, Bacula reads through the whole Volume, and does not crash.  I have
> > committed this to the master branch on Source Forge.
>
> OK, this is a good start. :)
>
> > 3. The FileIndex records in the Attribute records do not correspond to
> > the record sequence numbers.  This is what caused Bacula to fail the bls
> > (the -p option allows it to continue).  I haven't looked at this yet, but
> > will start looking at the VirtualFull code to see if it was an oversight
> > on my part.  If you know more about how the Volume was created and what
> > kind of Jobs are on it, please let me know as it may help get to the
> > bottom of the problem.
>
> Additional to my description above, here is the FileSet:
>
> FileSet {
>   Name = "FileSetX"
>   Ignore FileSet Changes = yes
>   Include {
>     Options {
>       signature = MD5
>       compression = GZIP9
>       readfifo = yes
>     }
>     File=/chroot/write/html
>     File=/chroot/write/lib/squirrelmail/data
>     File=/chroot/write/var/spool/filter/quarantine
>     File=/chroot/write/var/spool/mail
>     File=/var/spool/postfix
>     File=/chroot/write/share
>     File=/write/home
>     File=/chroot/write/precache
>     File=/chroot/write/var/lib/mysql/logs
>     File=/var/log
> # Fifo output from clientrunscriptbackup (backup-out and mysqldump)
>     File=/var/spool/backup-out/MYSQLPIPE
>     File=/var/spool/backup-out/SYSTEMPIPE
>     File=/var/spool/backup-out/SPAMPIPE
>   }
>   Exclude {
>     File=/chroot/write/share/backup
>   }
> }
>
> > The problem with FileIndex records out of order is that restore by file
> > will not work correctly, even a full restore may not get all the records.
> >  The records *can* be extracted but to get them all might require editing
> > the bsr file or using bextract without any bsr ...  This is not good.

The above turns out not to be the case, so no need to worry about this.

> >
> > There is a bug report open on restore problems related to VirtualFull
> > jobs, so possibly this is related.  I will look into that.
> >
> > Could you give me the exact commands that caused this problem in the
> > beginning?  I.e. you refer to restore bug -- option 12.  I would like to
> > see what you wanted to do with option 12.  The more info I have the
> > easier it will be to find and fix the problem.  Many thanks.

It turns out that I was doubly wrong above: first, sorry for the option 12 
confusion, as I said, I clicked on the wrong email then got confused.

Second "wrong" is that for VirtualFull, it is perfectly normal that the 
FileIndex values at a low level are out of order.  I had forgetten about 
that.  The FileIndex values at the record level are sequential and that is 
all that is important.

After looking at this more, I see that there was sanity checking code in bls, 
bscan, and bextract that wanted to guarantee that the FI is sequential, which 
was OK before VirtualFull but should have been removed when I implemented 
VirtualFull.

Those sanity checks are now removed, so the problems with bug #1410 should now 
all be past history.

>
> As I say, 'restore bug option 12' was something different.
>
> I first noticed the problem that this mail was about when bacula-sd started
> crashing on scheduled VirtualFulls for some of our users. So there was no
> command in the beginning, other than scheduling VirtualFulls.
>
> I had a look an bugs.bacula.org, and found Bug #1410, which looked like it
> might be related, and discovered that all my VirtualFulls report
> 'Record header file index X not equal record index Y'.
>
> > I will report back when I have more info ...
>

>From my point of view, all is OK now.  The File Index reports were a false 
alarm, and the crash was due to corrupted data on the Volume.  Bacula now 
protects itself from the corrupted data and at least is able to read through 
your Volume.

What I cannot explain is why you have corrupted data on your Volume.  As best 
I can tell, it is only in Job Session records, which is very odd.  This seems 
most likely to come from something (such as VirtualFull or some other code) 
reading the session records, then writing junk in their place.

I'll look more carefully how VirtualFull handles them and possibly add some 
code that warns when a bad record is found.

Kern

------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to