Re: [Bacula-devel] Bug #1410 - Broken VirtualFull volumes and bacula-sd segfault

Graham Keeling Mon, 18 Jan 2010 03:50:08 -0800

Hello,
You've replied to an (I think) unrelated post about a different set of bugs
(Re: [Bacula-devel] bconsole restore bug - option 12.).

This is 'Re: Bug #1410 - Broken VirtualFull volumes and bacula-sd segfault', so
I've adjusted the 'In-Reply-To' header so that it will hopefully end up in
the right thread.

On Mon, Jan 18, 2010 at 11:53:24AM +0100, Kern Sibbald wrote:
> Hello Graham,
> 
> I have spent some time reviewing this bug.  Thanks for the database and conf 
> file, I was able to easily reproduce what you saw.
> 
> It seems to me that there are three problems here -- unfortunately all 
> possibly quite serious:
> 
> 1. The Volume is corrupt.
> 2. When Bacula reads the corrupt data, it smashes its stack
> 3. The FileIndex records are not sequential.
> 
> 1. Do you have any idea how the Volume got corrupted?  
> 
> It looks like the bad data is associated with the JobSession records (when a 
> job starts, Bacula writes a label to the tape indicating the beginning of a 
> job).

Unfortunately, I do not know how the volume got corrupted so that the programs
crash.
However, I do know that running a VirtualFull always messes up some indexes
so that tools like bls complain. I thought that the crashing was probably
related to this somehow. 
Bug #1410 talks about this, and it is easy for me to reproduce:

a) Do a Full backup
b) Do an Incremental backup
c) Do a VirtualFull backup

bls now complains about the index on the VirtualFull volume.
If I do no Incremental, the VirtualFull is fine.

> Have you used a modified SD in producing this Volume?  

No.

> If so, then there is a bug in the code.  If not, I would like to learn more 
> about the history of this Volume.

The Job was set to do an Incremental every night, with a VirtualFull once a
week.
I have a Pool for Fulls and a Pool for Incrementals.
'NextPool' on the Incremental Pool is set to the Full Pool, so all VirtualFulls
end up in the Full Pool.

It had been running perhaps a couple of months.
It is only backing up a few megabytes of files, and the mysql database via a
fifo.
There is a directory containing lots of hard links. It is possible that this
is causing some trouble, since bls crashes whilst listing them. The structure
is like this...
/var/log/change_tracker/oc/XXXXXX/(lots of small files called the same thing)
...where XXXXXX is a timestamp - the lots of small files are hard linked across
the timestamped directories.
There is one job per volume. Volumes are recycled after a couple of weeks.

The idea was to never do a Full, and to rely on the VirtualFulls to consolidate
the Incrementals.
Perhaps the continual shuffling of the indexes eventually causes the crash.

Once the bacula-sd starts crashing, no more VirtualFulls can happen.

> 2. Bacula trashes its stack when it encounters the bad Session records.  
> Unfortunately, the serial code used to write and extract lables is very old 
> and didn't properly protect itself from bad data on the Volume.  I have now 
> modified the current source code to fix this problem.  With the fix, Bacula 
> reads through the whole Volume, and does not crash.  I have committed this to 
> the master branch on Source Forge.

OK, this is a good start. :)

> 3. The FileIndex records in the Attribute records do not correspond to the 
> record sequence numbers.  This is what caused Bacula to fail the bls (the -p 
> option allows it to continue).  I haven't looked at this yet, but will start 
> looking at the VirtualFull code to see if it was an oversight on my part.  If 
> you know more about how the Volume was created and what kind of Jobs are on 
> it, please let me know as it may help get to the bottom of the problem.

Additional to my description above, here is the FileSet:

FileSet {
  Name = "FileSetX"
  Ignore FileSet Changes = yes
  Include {
    Options {
      signature = MD5
      compression = GZIP9
      readfifo = yes
    }
    File=/chroot/write/html
    File=/chroot/write/lib/squirrelmail/data
    File=/chroot/write/var/spool/filter/quarantine
    File=/chroot/write/var/spool/mail
    File=/var/spool/postfix
    File=/chroot/write/share
    File=/write/home
    File=/chroot/write/precache
    File=/chroot/write/var/lib/mysql/logs
    File=/var/log
# Fifo output from clientrunscriptbackup (backup-out and mysqldump)
    File=/var/spool/backup-out/MYSQLPIPE
    File=/var/spool/backup-out/SYSTEMPIPE
    File=/var/spool/backup-out/SPAMPIPE
  }
  Exclude {
    File=/chroot/write/share/backup
  }
}

> The problem with FileIndex records out of order is that restore by file will 
> not work correctly, even a full restore may not get all the records.  The 
> records *can* be extracted but to get them all might require editing the bsr 
> file or using bextract without any bsr ...  This is not good.  
> 
> There is a bug report open on restore problems related to VirtualFull jobs, 
> so 
> possibly this is related.  I will look into that.
> 
> Could you give me the exact commands that caused this problem in the 
> beginning?  I.e. you refer to restore bug -- option 12.  I would like to see 
> what you wanted to do with option 12.  The more info I have the easier it 
> will be to find and fix the problem.  Many thanks.

As I say, 'restore bug option 12' was something different.

I first noticed the problem that this mail was about when bacula-sd started
crashing on scheduled VirtualFulls for some of our users. So there was no
command in the beginning, other than scheduling VirtualFulls.

I had a look an bugs.bacula.org, and found Bug #1410, which looked like it
might be related, and discovered that all my VirtualFulls report
'Record header file index X not equal record index Y'.

> I will report back when I have more info ...

------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Bug #1410 - Broken VirtualFull volumes and bacula-sd segfault

Reply via email to