Hello,
Actually, the bug seems to have disappeared from the tracker. I have, however,
noticed an anomaly in the tracker dates - #2648 has been submitted November
2021, while #2649 has been submitted June 2022. Furthermore, I do have the
automated messages sent by Mantis with the bug I originally declared, which
happened during May 2022, which is between these two dates. I deduce that for
some reason, during either the end of May or start of June, the bug tracker may
have suffered from some kind of data corruption and backups had to be restored
from the end of 2021, losing a few months of bug reports (Which should be
around 15-20 bugs if we account for missing IDs), including mine.
I have included all related Mantis e-mails I received during that period, which
includes the original report and comments made by the Bacula team - I have
attempted to reconstruct the general meaning of the discussion that we had
based on my memory of what I wrote (which did not trigger automated messages)
--
Me : <Initial report>
ebollengier : Did you reproduce on 11.0.6 or 11.3.3 ?
Me : No, official Debian packages do not go past 9.6. Do you have an idea on
which commits could have fixed the issue ?
ebollengier : Commit from 22 Oct ("Fix #5461 #5513 #4717 About WroteVol
non-zero message")
Me : This looks like it. Do you have a specific patch corresponding to the
commit so that the Debian team can backport the fix to 9.6 ?
ebollengier : No, we do not support the 9.X versions anymore, there are too
many differences with v11
Me : <Short explanation about Debian keeping major versions frozen during a
release but still manually backporting fixes from newer versions>
The last message was confusing to me, but I never received clarification -
probably because the bug disappeared from the tracker.
--
JC
From: "Carsten Leonhardt" <[email protected]>
To: "Julien Chiaramello" <[email protected]>
Cc: [email protected]
Sent: Tuesday, August 16, 2022 3:24:30 AM
Subject: Re: Bug#1012301: bacula: Corruption of File media during concurrent
backups
Hi Julien,
Julien Chiaramello <[email protected]> writes:
> This bug did not happen before we implemented Concurrent Jobs
>
> The bug has been declared upstream : https://bugs.bacula.org/view.php?id=2664
thanks for your bug report. Just one thing - can you confirm the
upstream bug number? Currently the highest bug number is 2659, so you
probably have a typo in there.
Regards
Carsten
--
Julien Chiaramello
Systems and Networks Engineer
+33 (0)4 83 65 12 30
[email protected]
www.quantificare.com
980 Avenue de Roumanille • Fairway bât. D
06410 Biot • FRANCE
--- Begin Message ---
A NOTE has been added to this issue.
----------------------------------------------------------------------
(0008656) ebollengier (manager) - 2022-05-23 09:42
https://bugs.bacula.org/view.php?id=2664#c8656
----------------------------------------------------------------------
The debian team can get a "bug" request to ask them to upgrade the software, for
a particular issue,
it sounds complicated.
Our debian packages are provided with a debian APT repository, more information
can be found
on www.bacula.org
Thanks
----------------------------------------------------------------------
--- End Message ---
--- Begin Message ---
A NOTE has been added to this issue.
----------------------------------------------------------------------
(0008654) ebollengier (manager) - 2022-05-23 09:35
https://bugs.bacula.org/view.php?id=2664#c8654
----------------------------------------------------------------------
Hello,
No, unfortunately we cannot support all versions, the patch might be ok, but
it's hard to say.
The difference between 11.0 and 9.4 represents years of work and fixes. The next
major version
will be out in few weeks. We provide also debian/ubuntu packages on the web
site.
Thanks
----------------------------------------------------------------------
--- End Message ---
--- Begin Message ---
A NOTE has been added to this issue.
----------------------------------------------------------------------
(0008652) ebollengier (manager) - 2022-05-23 09:22
https://bugs.bacula.org/view.php?id=2664#c8652
----------------------------------------------------------------------
The patch included in 11.x might fix the issue for example
```
Author: Eric Bollengier <[email protected]>
Date: Tue Oct 22 14:40:15 2019 +0200
Fix #5461 #5513 #4717 About WroteVol non-zero message
When this message is printed, we have seen data corruptions on
volumes that are currently mounted. This patch should prevent
that situation to happen.
```
----------------------------------------------------------------------
--- End Message ---
--- Begin Message ---
The following issue has been ASSIGNED.
======================================================================
https://bugs.bacula.org/view.php?id=2664
======================================================================
Reported By: jc.of.qc
Assigned To: ebollengier
======================================================================
Project: Bacula Bug Reports
Issue ID: 2664
Category: Storage Daemon
Tags: bacula-sd
Reproducibility: random
Severity: major
Priority: normal
Status: feedback
======================================================================
Date Submitted: 2022-05-12 12:33 CEST
Last Modified: 2022-05-23 09:04 CEST
======================================================================
Summary: Corruption of re-usable File media during concurrent
backups
Description:
Under a specific type of configuration, a Bacula job may sometimes corrupt a
previously written volume, losing all data on it. The following circumstances
have been identified :
- Multiple concurrent jobs are started at the same time, all using the same
Schedule and Pool
- That Pool must have a Volume Use Duration which is higher than the frequency
of the Jobs in the Schedule (For example, hourly backups with a VUD of 2 hours)
- The Pool uses a Device which uses File media
Once these conditions, are filled, a job may randomly corrupt a volume,
typically when that volume is marked as "Used". This has the following
consequences :
- The Job status is "OK -- with warnings"
- The Job includes the following error from the "mount.c" file : "Hey!!!!!
WroteVol non-zero !!!!!"
- One of previously-written volumes is marked in Error
- That volume size on the filesystem drops below 1 kB
- Attempting to restore files from a volume in error fails (Ending up as a
mismatch)
Steps to Reproduce:
Configure a Bacula cluster with the following conditions :
- A Device must use "Media Type = File"
- A Pool must have a certain Volume Use Duration (for example, 2 hours)
- A Schedule must perform regular jobs with a higher frequency than the Volume
Use Duration of the Pool (for example, every hour)
- Multiple Jobs must be using this Schedule and Pool
- The jobs must run concurrently
Under these conditions, a job will eventually corrupt a previously written
volume
Additional Information:
This bug happens on various releases of Bacula from the official Debian packages
(5.2, 7.4 and 9.4 are affected)
This bug happens on multiple separated Bacula clusters (Nothing is shared
between them)
In case it matters, the FDs use PKI Signatures and Encryption
This bug does not happen if the Volume Use Duration is set lower than the
frequency of backups, ensuring a given Volume is never re-used between "batches"
of backups (This is our current workaround)
This bug did not happen before we implemented Concurrent Jobs
Attached files (anonymized) include the director and SD configs of an affected
cluster, a sample subconfig file for a client and an example of output of a job
where the corruption occurs
======================================================================
----------------------------------------------------------------------
(0008649) ebollengier (manager) - 2022-05-23 09:04
https://bugs.bacula.org/view.php?id=2664#c8649
----------------------------------------------------------------------
Hello,
Thanks for the detailed report. Did you try to reproduce this problem with the
latest version? (11.0.6 or 11.3.3)
I believe that we have fixed a couple of things that might trigger this
situation.
Thanks
Issue History
Date Modified Username Field Change
======================================================================
2022-05-12 12:33 jc.of.qc New Issue
2022-05-12 12:33 jc.of.qc Tag Attached: bacula-sd
2022-05-12 12:33 jc.of.qc File Added: bacula-dir.conf
2022-05-12 12:33 jc.of.qc File Added: bacula-sd.conf
2022-05-12 12:33 jc.of.qc File Added: client.conf
2022-05-12 12:33 jc.of.qc File Added: fail.txt
2022-05-23 09:04 ebollengier Assigned To => ebollengier
2022-05-23 09:04 ebollengier Status new => feedback
2022-05-23 09:04 ebollengier Note Added: 0008649
======================================================================
--- End Message ---