Hello Phil,

There was one major change between Bacula version 9.6.3 and 9.6.5 and that is that smartalloc (used for all Bacula malloc/realloc/...) returns memory buffers that are zeroed rather than buffers full of a specific pattern.  This in itself would not cause any problems, but by having buffers pre-zeroed by the malloc routines, some (not all) of the places that did a malloc() followed immediately by zeroing the buffer can dispense with the zeroing of the buffer.  This is a small performance improvement. Any bug that would show up should depend only on one of the components using 9.6.5 having a strange bug (buffer not zeroed when it should have been).  As long as all components of Bacula running on a given machine are running 9.6.5 -- there should be no problem.  Of course, it is always possible that at some strange place in the code, the smartalloc code is not being used, and if that is the case, we could end up with non-zero buffers.

From what I hear from you, the bug involves several different Bacula versions running on different machines, and never happens when everything is running version 9.6.5.

Best regards,

Kern

On 8/12/20 7:39 PM, Phil Stracchino wrote:
On 2020-08-12 12:53, Michael Plante wrote:
Since you have a good release and a bad release, you could try
git-bisect to identify (or at least narrow down) which commit introduced
the problem.  I can't remember which system you said you had the
director on, but did you say you are compiling the director yourself or
at least are able to?  How many days of running do you think it takes to
definitively say if the bug is present/absent in a given configuration?
Do your distro's patches (if any) differ between 9.6.3/5?

Good questions.

I have one client-only install running Fedora 31 and using the Bacula
Community binary package.  (It seems no more or less likely than any
other client to have a hung job.)  I have one system, my NAS, running
client and my storage daemon, compiled locally from the git source on
Solaris 11.3 using Developer Studio 12.4.

All other systems running Bacula, including the director, are Gentoo
Linux and locally compiled.   Gentoo has no Bacula patches for 9.6.5,
and only one 9.6.3 patch, which modifies only static linking and does
not come into play because I'm not building statically.

I'm currently on, I think, day 10 with no hung jobs since rolling the
Director back to 9.6.3 and returning to the load-balanced DB
configuration.  No matter how I configure DB connection, I've yet to
manage to go more than about 3-4 days at a time without a hung job on
director 9.6.5, and it's not uncommon to see multiple hung jobs in a
single night's run.




_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to