Hello Phil,
There was one major change between Bacula version 9.6.3 and 9.6.5 and
that is that smartalloc (used for all Bacula malloc/realloc/...) returns
memory buffers that are zeroed rather than buffers full of a specific
pattern. This in itself would not cause any problems, but by having
buffers pre-zeroed by the malloc routines, some (not all) of the places
that did a malloc() followed immediately by zeroing the buffer can
dispense with the zeroing of the buffer. This is a small performance
improvement. Any bug that would show up should depend only on one of the
components using 9.6.5 having a strange bug (buffer not zeroed when it
should have been). As long as all components of Bacula running on a
given machine are running 9.6.5 -- there should be no problem. Of
course, it is always possible that at some strange place in the code,
the smartalloc code is not being used, and if that is the case, we could
end up with non-zero buffers.
From what I hear from you, the bug involves several different Bacula
versions running on different machines, and never happens when
everything is running version 9.6.5.
Best regards,
Kern
On 8/12/20 7:39 PM, Phil Stracchino wrote:
On 2020-08-12 12:53, Michael Plante wrote:
Since you have a good release and a bad release, you could try
git-bisect to identify (or at least narrow down) which commit introduced
the problem. I can't remember which system you said you had the
director on, but did you say you are compiling the director yourself or
at least are able to? How many days of running do you think it takes to
definitively say if the bug is present/absent in a given configuration?
Do your distro's patches (if any) differ between 9.6.3/5?
Good questions.
I have one client-only install running Fedora 31 and using the Bacula
Community binary package. (It seems no more or less likely than any
other client to have a hung job.) I have one system, my NAS, running
client and my storage daemon, compiled locally from the git source on
Solaris 11.3 using Developer Studio 12.4.
All other systems running Bacula, including the director, are Gentoo
Linux and locally compiled. Gentoo has no Bacula patches for 9.6.5,
and only one 9.6.3 patch, which modifies only static linking and does
not come into play because I'm not building statically.
I'm currently on, I think, day 10 with no hung jobs since rolling the
Director back to 9.6.3 and returning to the load-balanced DB
configuration. No matter how I configure DB connection, I've yet to
manage to go more than about 3-4 days at a time without a hung job on
director 9.6.5, and it's not uncommon to see multiple hung jobs in a
single night's run.
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel