Anthony,

Your proposal to have a local copy on the stack is solely the solution. Anything else would lock the mutex too long. I just saw your solution on Github.

As Philipp said, we're happy to have your PR. Thanks and have a good start into 2020.

Frank


Am 27.12.19 um 10:49 schrieb Philipp Storz:
Hello Antony,

thank you very much for your contribution and the detailed information.

A pull request would be exactly the right thing to get your changes upstream.

Because of the holiday season the response might take some time.

Thank you very much and have a nice holiday season.

Philipp


Am 27.12.19 um 08:41 schrieb Anthony Vaccaro:
Hi all,

I've been tracking down an issue we've been seeing on our bareos install, where 
restore jobs
submitted at the same time end up with the same job name, which results in one 
of the jobs being
rejected by the storage daemon and failing.

I think i've found the issue in the source code, inside the CreateUniqueJobName 
function:

https://github.com/bareos/bareos/blob/1417cff723cca48eccba156eca8fd90b99fbe122/core/src/dird/job.cc#L1503

the seq variable is incremented inside of the mutex, which should be safe, but 
then its value is
read into the JobControlRecord outside of the mutex, which is a race condition 
if other threads are
manipulating the value at the same time.

I've written a short program to verify this (for my own understanding as much 
as anyone else's),
where i've also attempted to fix the issue by assigning seq to a non-static 
(ie. thread-local)
variable inside of the mutex, and then using that for the printf statement 
outside of the mutex. It
seems to work.

https://gist.github.com/WaryWolf/ea7d524f96725d823aae5d96a3727442

I'm happy to submit a PR for this, I just wanted to confirm via the mailing 
list first as i'm
relatively new to this project.

I've attached a sample from our bareos.log showing this bug in action. I've 
stripped out some
unnecessary lines, but hopefully the log still makes sense - the AfterJob 
script for jobs 198059 and
198072 creates jobs 198166 and 198168 respectively, which both have the name
"archive.2019-12-25_18.34.31_17". 198166 starts successfully, but when 198168 
starts, the storage
daemon rejects it due to a job with the same name already being authenticated.

Thanks and regards,

Anthony Vaccaro

--
You received this message because you are subscribed to the Google Groups 
"bareos-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
bareos-devel+unsubscr...@googlegroups.com 
<mailto:bareos-devel+unsubscr...@googlegroups.com>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/bareos-devel/051989ea-b9d8-4186-8439-9a3a96db85c4%40googlegroups.com
<https://groups.google.com/d/msgid/bareos-devel/051989ea-b9d8-4186-8439-9a3a96db85c4%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Mit freundlichen Grüßen

 Frank Ueberschar                          frank.uebersc...@bareos.com
 Bareos GmbH & Co. KG                      Phone: +49 221 63 06 93-88
 http://www.bareos.com                     Fax:   +49 221 63 06 93-10

 Sitz der Gesellschaft: Köln | Amtsgericht Köln: HRA 29646
 Geschäftsführer: S. Dühr, M. Außendorf, J. Steffens, P. Storz

--
You received this message because you are subscribed to the Google Groups 
"bareos-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to bareos-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-devel/58a49f9a-900b-1df3-d3e3-bec4cd60aa9f%40bareos.com.

Reply via email to