Hello, 

Actually, the bug seems to have disappeared from the tracker. I have, however, 
noticed an anomaly in the tracker dates - #2648 has been submitted November 
2021, while #2649 has been submitted June 2022. Furthermore, I do have the 
automated messages sent by Mantis with the bug I originally declared, which 
happened during May 2022, which is between these two dates. I deduce that for 
some reason, during either the end of May or start of June, the bug tracker may 
have suffered from some kind of data corruption and backups had to be restored 
from the end of 2021, losing a few months of bug reports (Which should be 
around 15-20 bugs if we account for missing IDs), including mine. 

I have included all related Mantis e-mails I received during that period, which 
includes the original report and comments made by the Bacula team - I have 
attempted to reconstruct the general meaning of the discussion that we had 
based on my memory of what I wrote (which did not trigger automated messages) 

-- 

Me : <Initial report> 

ebollengier : Did you reproduce on 11.0.6 or 11.3.3 ? 

Me : No, official Debian packages do not go past 9.6. Do you have an idea on 
which commits could have fixed the issue ? 

ebollengier : Commit from 22 Oct ("Fix #5461 #5513 #4717 About WroteVol 
non-zero message") 

Me : This looks like it. Do you have a specific patch corresponding to the 
commit so that the Debian team can backport the fix to 9.6 ? 

ebollengier : No, we do not support the 9.X versions anymore, there are too 
many differences with v11 

Me : <Short explanation about Debian keeping major versions frozen during a 
release but still manually backporting fixes from newer versions> 

The last message was confusing to me, but I never received clarification - 
probably because the bug disappeared from the tracker. 

-- 

JC 


From: "Carsten Leonhardt" <[email protected]> 
To: "Julien Chiaramello" <[email protected]> 
Cc: [email protected] 
Sent: Tuesday, August 16, 2022 3:24:30 AM 
Subject: Re: Bug#1012301: bacula: Corruption of File media during concurrent 
backups 

Hi Julien, 

Julien Chiaramello <[email protected]> writes: 

> This bug did not happen before we implemented Concurrent Jobs 
> 
> The bug has been declared upstream : https://bugs.bacula.org/view.php?id=2664 

thanks for your bug report. Just one thing - can you confirm the 
upstream bug number? Currently the highest bug number is 2659, so you 
probably have a typo in there. 

Regards 

Carsten 

-- 
Julien Chiaramello 
Systems and Networks Engineer 

+33 (0)4 83 65 12 30 
[email protected] 
www.quantificare.com 

980 Avenue de Roumanille • Fairway bât. D 
06410 Biot • FRANCE 
--- Begin Message ---
A NOTE has been added to this issue.

---------------------------------------------------------------------- 
 (0008656) ebollengier (manager) - 2022-05-23 09:42
 https://bugs.bacula.org/view.php?id=2664#c8656 
---------------------------------------------------------------------- 
The debian team can get a "bug" request to ask them to upgrade the software, for
a particular issue,
it sounds complicated.

Our debian packages are provided with a debian APT repository, more information
can be found
on www.bacula.org

Thanks
----------------------------------------------------------------------


--- End Message ---
--- Begin Message ---
A NOTE has been added to this issue.

---------------------------------------------------------------------- 
 (0008654) ebollengier (manager) - 2022-05-23 09:35
 https://bugs.bacula.org/view.php?id=2664#c8654 
---------------------------------------------------------------------- 
Hello,

No, unfortunately we cannot support all versions, the patch might be ok, but
it's hard to say.
The difference between 11.0 and 9.4 represents years of work and fixes. The next
major version
will be out in few weeks. We provide also debian/ubuntu packages on the web
site.

Thanks
----------------------------------------------------------------------


--- End Message ---
--- Begin Message ---
A NOTE has been added to this issue.

---------------------------------------------------------------------- 
 (0008652) ebollengier (manager) - 2022-05-23 09:22
 https://bugs.bacula.org/view.php?id=2664#c8652 
---------------------------------------------------------------------- 
The patch included in 11.x might fix the issue for example
```
Author: Eric Bollengier <[email protected]>
Date:   Tue Oct 22 14:40:15 2019 +0200

    Fix #5461 #5513 #4717 About WroteVol non-zero message
    
    When this message is printed, we have seen data corruptions on
    volumes that are currently mounted. This patch should prevent
    that situation to happen.
```
----------------------------------------------------------------------


--- End Message ---
--- Begin Message ---
The following issue has been ASSIGNED. 
====================================================================== 
https://bugs.bacula.org/view.php?id=2664 
====================================================================== 
Reported By:                jc.of.qc
Assigned To:                ebollengier
====================================================================== 
Project:                    Bacula Bug Reports
Issue ID:                   2664
Category:                   Storage Daemon
Tags:                       bacula-sd
Reproducibility:            random
Severity:                   major
Priority:                   normal
Status:                     feedback
====================================================================== 
Date Submitted:             2022-05-12 12:33 CEST
Last Modified:              2022-05-23 09:04 CEST
====================================================================== 
Summary:                    Corruption of re-usable File media during concurrent
backups
Description: 
Under a specific type of configuration, a Bacula job may sometimes corrupt a
previously written volume, losing all data on it. The following circumstances
have been identified :

- Multiple concurrent jobs are started at the same time, all using the same
Schedule and Pool
- That Pool must have a Volume Use Duration which is higher than the frequency
of the Jobs in the Schedule (For example, hourly backups with a VUD of 2 hours)
- The Pool uses a Device which uses File media

Once these conditions, are filled, a job may randomly corrupt a volume,
typically when that volume is marked as "Used". This has the following
consequences :

- The Job status is "OK -- with warnings"
- The Job includes the following error from the "mount.c" file : "Hey!!!!!
WroteVol non-zero !!!!!"
- One of previously-written volumes is marked in Error
- That volume size on the filesystem drops below 1 kB
- Attempting to restore files from a volume in error fails (Ending up as a
mismatch)

Steps to Reproduce: 
Configure a Bacula cluster with the following conditions :
- A Device must use "Media Type = File"
- A Pool must have a certain Volume Use Duration (for example, 2 hours)
- A Schedule must perform regular jobs with a higher frequency than the Volume
Use Duration of the Pool (for example, every hour)
- Multiple Jobs must be using this Schedule and Pool
- The jobs must run concurrently

Under these conditions, a job will eventually corrupt a previously written
volume

Additional Information: 
This bug happens on various releases of Bacula from the official Debian packages
(5.2, 7.4 and 9.4 are affected)

This bug happens on multiple separated Bacula clusters (Nothing is shared
between them)

In case it matters, the FDs use PKI Signatures and Encryption

This bug does not happen if the Volume Use Duration is set lower than the
frequency of backups, ensuring a given Volume is never re-used between "batches"
of backups (This is our current workaround)

This bug did not happen before we implemented Concurrent Jobs

Attached files (anonymized) include the director and SD configs of an affected
cluster, a sample subconfig file for a client and an example of output of a job
where the corruption occurs
====================================================================== 

---------------------------------------------------------------------- 
 (0008649) ebollengier (manager) - 2022-05-23 09:04
 https://bugs.bacula.org/view.php?id=2664#c8649 
---------------------------------------------------------------------- 
Hello,

Thanks for the detailed report. Did you try to reproduce this problem with the
latest version? (11.0.6 or 11.3.3)
I believe that we have fixed a couple of things that might trigger this
situation.

Thanks

 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2022-05-12 12:33 jc.of.qc       New Issue                                    
2022-05-12 12:33 jc.of.qc       Tag Attached: bacula-sd                      
2022-05-12 12:33 jc.of.qc       File Added: bacula-dir.conf                    
2022-05-12 12:33 jc.of.qc       File Added: bacula-sd.conf                    
2022-05-12 12:33 jc.of.qc       File Added: client.conf                      
2022-05-12 12:33 jc.of.qc       File Added: fail.txt                         
2022-05-23 09:04 ebollengier    Assigned To               => ebollengier     
2022-05-23 09:04 ebollengier    Status                   new => feedback     
2022-05-23 09:04 ebollengier    Note Added: 0008649                          
======================================================================


--- End Message ---

Reply via email to