Hello,
Every case of this particular error message that I have seen has been
due to data corruption outside of Bacula. Typically this happens when
a disk drive is bad, but since you are running ZFS and its checksums
are good, I can see only several other possibilities:
1. The ZFS code is messed up. Running a current distribution with the
ZFS kernel module should not have this problem, but if you are running
something a bit older or using a user file system rather than the kernel
module you could have problems.
2. You have bad cables or a bad disk controller.
3. You seem to be using 5.2.x FDs with 5.0.x Director/SD,
which is not supported. Your FDs should never be a higher
version that the DIR/SD, but may be lower. In addition your
DIR and SD must always be the same version.
Oops, I just re-read your email and probably point 1 does not apply
since you seem to be running ZFS on Solaris so there is little or no
possibility that the code is bad.
Best regards,
Kern
On 01/14/2014 06:04 PM, Roberts, Ben wrote:
>
> Hi all,
>
>
>
> I've recently setup a new Bacula director/storage daemon in
> preparation to move our existing backups to newer hardware. During
> testing, I've run into problems doing restores of backups taken to
> disk, failing with the messages:
>
>
>
> Error: block.c:275 Volume data error at 24:4294944994! Wanted ID:
> "BB02", got "". Buffer discarded.
>
> Fatal error: fd_cmds.c:169 Command error with FD, hanging up.
>
>
>
> Similar errors are reported for both file-level backups, and
> block-level backups made using bpipe. I've seen the instructions in
> http://www.bacula.org/en/dev-manual/main/main/Restore_Command.html#SECTION0021100000000000000000,
> but these only seem to apply to tape backups rather than disk ones.
> Regardless, I've tried striping the positional information from the
> bootstrap file with no effect.
>
>
>
> Some relevant notes from my testing:
>
> - The issue does not affect every backup made, but does
> affect a significant proportion tested.
>
> - A single job can be affected at multiple locations, i.e.
> skipping one affected file might see the job fail again at a
> subsequent file.
>
> - Attempting to restore the same job multiple times elicits
> failures at the same block each time. Re-running the job may produce a
> restorable backup, otherwise a job that will fail at a different
> location again. Other jobs fail at different locations.
>
> - All data is stored on ZFS, which reports completely clean
> of any checksum errors at the filesystem level
>
> - The server is not reporting any hardware issues, e.g.
> corrected or uncorrectable memory reads, disk accesses etc.
>
> - The backup jobs are multiple TB in size, and restores
> frequently fail within the first couple hundred GB.
>
> - The storage daemon is configured with a disk-changer backed
> autochanger, writing to 100GB volumes, all residing within the same
> ZFS filesystem (sitting atop a large RAID-Z2 disk array).
>
>
>
> The director is running "Version: 5.0.2 (28 April 2010)
> i386-pc-solaris2.10 solaris 5.10" (compiled on solaris 5.10, running
> on 5.11). Storage daemon runs on the same machine as the director.
> (I'm loosely tied to this version so the director can interact with a
> storage daemon on another machine connected to a tape changer).
>
> A sample client is running "Version: 5.2.13 (19 February 2013)
> i386-pc-solaris2.11 solaris 5.11".
>
>
>
> From my understanding of how the Bacula components fit together, I
> suspect the corruption must be happening in the Storage daemon (since
> this is the only component that would be interested in the BB02 block
> header?) before the data is written to disk (otherwise ZFS would be
> reporting read/write errors).
>
>
>
> Is this an issue that's been seen before on other disk backups? Can
> anyone provide any assistance in locating and fixing the cause of the
> corruption? Any help would be greatly appreciated.
>
>
>
> Regards,
>
>
>
> *Ben Roberts***
>
> IT Infrastructure
>
>
>
> --- Relevant config excerpts:
>
>
>
> Autochanger {
>
> Name = backup3-autochanger
>
> Device = drive-restore-backup3, drive-1-backup3
>
> Device = drive-2-backup3, drive-3-backup3
>
> Device = drive-4-backup3, drive-5-backup3
>
> Changer Device = /data2/bacula/storage/backup3-autochanger.conf
>
> Changer Command = "/opt/bacula/etc/disk-changer %c %o %S %a %d"
>
> }
>
>
>
> Device {
>
> Name = drive-1-backup3
>
> Archive Device = /data2/bacula/storage/backup3-autochanger/drive1
>
> Device Type = File
>
> Media Type = File-backup3
>
> AutoChanger = yes
>
> Removable media = no
>
> Random access = yes
>
> Requires Mount = no
>
> Always Open = no
>
> Label Media = yes
>
> Maximum Changer Wait = 180
>
> Drive Index = 1
>
> Maximum Spool Size = 100G
>
> }
>
> ...
>
>
>
> Storage {
>
> Name = backup3-sd
>
> Address = backup3.local
>
> Device = backup3-autochanger
>
> Media Type = File-backup3
>
> Autochanger = yes
>
> }
>
>
>
> Pool {
>
> Name = Disk-45Day-backup3
>
> Pool Type = Backup
>
> Recycle = yes
>
> AutoPrune = yes
>
> Job Retention = 45 days
>
> Volume Retention = 45 days
>
> Label Format = Disk-45Day-backup3-
>
> Storage = backup3-sd
>
> Maximum Volume Bytes = 100G
>
> }
>
>
> ------------------------------------------------------------------------
> This email and any files transmitted with it contain confidential and
> proprietary information and is solely for the use of the intended
> recipient. If you are not the intended recipient please return the
> email to the sender and delete it from your computer and you must not
> use, disclose, distribute, copy, print or rely on this email or its
> contents. This communication is for informational purposes only. It is
> not intended as an offer or solicitation for the purchase or sale of
> any financial instrument or as an official confirmation of any
> transaction. Any comments or statements made herein do not necessarily
> reflect those of GSA Capital. GSA Capital Partners LLP is authorised
> and regulated by the Financial Conduct Authority and is registered in
> England and Wales at Stratton House, 5 Stratton Street, London W1J
> 8LA, number OC309261. GSA Capital Services Limited is registered in
> England and Wales at the same address, number 5320529.
>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
>
>
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users